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Abstract 

When data traffic in a wireless network is bursty, small amounts of data sporadically become 
available for transmission, at times that are unknown at the receivers, and an extra amount of energy 
must be spent at the transmitters to overcome this lack of synchronization between the network 
nodes. In practice, pre-defined header sequences are used with the purpose of synchronizing the 
different network nodes. However, in networks where relays must be used for communication, the 
overhead required for synchronizing the entire network may be very significant. 

In this work, we study the fundamental limits of energy-efficient communication in an asyn- 
chronous diamond network with two relays. We formalize the notion of relay synchronization by 
saying that a relay is synchronized if the conditional entropy of the arrival time of the source mes- 
sage given the received signals at the relay is small. We show that the minimum energy-per-bit 
for bursty traffic in diamond networks is achieved with a coding scheme where each relay is either 
synchronized or not used at all. A consequence of this result is the derivation of a lower bound 
to the minimum energy-per-bit for bursty communication in diamond networks. This bound allows 
us to show that schemes that perform the tasks of synchronization and communication separately 
(i.e., with synchronization signals preceding the communication block) can achieve the minimum 
energy-per-bit to within a constant fraction that ranges from 2 in the synchronous case to 1 in the 
highly asynchronous regime. 

1 Introduction 

Most theoretical studies of wireless networks assume that transmitters and receivers are synchronized, 
in the sense that the receiver knows when data transmission is about to start. This is in general justified 
by the fact that, if large amounts of data are to be transmitted, then the time and energy required for 
synchronization are negligible when compared to what is required for communication itself. Several 
applications, such as Wi-Fi, fall into this category and, in their context, optimizing the time and energy 
required for establishing the connection is of small practical importance. 

However, in certain applications such as wireless sensor networks and bursty data communication 
in cellular networks, small amounts of time-sensitive data are sporadically available for transmission, 
at times that are unknown to the receivers. In such scenarios, the receiver is constantly listening to the 
output of a noisy channel in an attempt to identify a message. An extra amount of energy is then spent at 
the transmitter to make sure that the message is not missed and the noise is not mistaken for the message. 
In the sporadic data model, this extra energy represents a significant part of the total energy spent and 
becomes a relevant quantity. There is a large body of work treating synchronization from a practical 
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perspective with the goal of minimizing overheads and synchronization errors. However, these studies 
lack a fundamental characterization of the energy and bandwidth costs of synchronization. 

Early work on the fundamental limits of asynchronous communication involved characterizing the 
data rates that can be achieved when the receiver does not know the beginning of the communication 
block [1]. Later, in [2], a similar model was considered, but the performance metric was instead the 
energy (or, in general, the cost) per bit required for reliable asynchronous communication. The charac- 
terization of the minimum energy-per-bit is important from a practical point of view, especially since it 
is often the case that the sensors in a wireless sensor network are battery-operated. Thus, in the case of 
short and sporadic transmissions, i.e., bursty traffic, when synchronization costs may in fact dominate 
the communication costs, the characterization of the minimum energy-per-bit is very relevant. 

In this work, we follow the asynchronism model from [2]. However, we focus on the AWGN channel 
model, rather than on discrete channels. We assume that B bits of data become available at the source 
node at a random arrival time vb, and must be communicated to a destination with a maximum delay 
ds l - The arrival time ub is assumed to be unknown to all network nodes, and unknown to the source 
before the arrival time itself. However, vb is known to be drawn from {1, As}, where Ab quantifies 
the asynchronism level. Under this setting, and assuming that vb is drawn uniformly at random from 
{1,...,Ab}, it was shown in [2] that the asynchronous minimum energy-per-bit of a point-to-point 
AWGN channel is given by 



where e b y = 2Nq In 2 is the minimum energy-per-bit for an AWGN channel with noise power Ao in 
the synchronous setting. Our first result is to show that the asynchronous minimum energy-per-bit in (1) 
can be achieved through a scheme where the tasks of synchronization and communication are performed 
separately. In such a scheme, which we refer to as a separation-based coding scheme, as soon as the 
message arrives (at time vb) the source uses a synchronization signal in order to inform the destination 
that the message is about to be transmitted. If this synchronization procedure succeeds, communication 
can then take place as if we were in the synchronous setting. We focus on such separation-based 
schemes due to their ease of design and practical implementation. 

We then move on to the main topic of the paper: asynchronous communication in multi-hop net- 
works. This is motivated by the fact that multi-hop communication with relays increases network range 
and throughput and reduces power consumption. The fundamental question we focus on is: "how should 
relays facilitate the communication between source and destination when they do not know the beginning 
of the transmission block?". On the one hand, one could devise a scheme where relays are constantly 
assuming that communication is taking place. However, this approach would intuitively waste energy 
outside the actual communication block. On the other hand, we could consider a separation-based 
scheme which first synchronizes all relays and the destination, and then proceeds to communicate over a 
synchronous network. However, this may also be potentially wasteful, since the relays are not required 
to decode the message, so they do not need to know the beginning of the transmission block precisely. 
In essence, our goal is to understand whether intermediate relays should be synchronized and whether 
separation-based coding schemes perform well. 

We study this problem in the context of the two-relay diamond network shown in Figure 1 . We say 
that a coding scheme synchronizes relay i if, intuitively, the signals received by relay i during times 
1,2, ...,Ab,Ab + 1, ...,Ab + ds, represented by Y^ B+dB , reveal a significant amount of information 
about ub', or, more precisely, if H{ub \ Y^ B+dB )/B — > as B — > oo. Under this notion of relay 
synchronization, we show that it is optimal from an energy-per-bit point of view to consider coding 
schemes that synchronize any relay that is used (i.e., that does not stay silent). This result allows us 

'We index the random arrival time v and the delay constraint d by B since we will consider an asymptotic regime in B, as 
described in Section 3. 
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Two-relay diamond network. 



to show that, depending on the specific values of 51,52, h\ and h,2, it is optimal from the energy-per- 
bit point of view to either only use relay 1, only use relay 2, or use both relay 1 and relay 2. This 
result is in contrast with the intuition provided by the synchronous case, in which, the capacity (and 
also the minimum energy-per-bit) is always improved if we utilize as many relays as are available. 
Finally, we utilize the fact that relays must be synchronized to derive a lower bound to the minimum 
energy-per-bit for the asynchronous two-relay diamond network. We then verify that the energy-per-bit 
achieved by a separation-based scheme is within a constant factor of this lower bound. This factor is 2 
in the synchronous case, but it drops towards 1 as the asynchronism-per-bit (log Ab)/B increases. We 
conclude that, in high-asynchronism regimes, where synchronization costs are high, separation-based 
schemes perform close to optimally. 

The paper is organized as follows. In section 2, we summarize some of the previous work on asyn- 
chronous communication. In section 3, we describe our network model and formally define the notion 
of the asynchronous minimum energy-per-bit that we use. In section 4, we provide some preliminary 
results. First, we describe the known results on the minimum energy-per-bit of point-to-point AWGN 
channels. Then we show how similar ideas can be used to derive upper and lower bounds for the min- 
imum energy-per-bit for the asynchronous diamond network. However, the ratio between these upper 
and lower bounds is unbounded, and the remainder of the paper is devoted to improving the lower bound 
(i.e., the converse direction). In section 5, we state our two main results. The first main result, Theorem 
4, essentially states that it is optimal to consider coding schemes where any relay that is used (i.e., does 
not stay silent) must be synchronized. We then state and prove our second main result, Theorem 5, 
which bounds the asynchronous minimum energy-per-bit of the two-relay diamond network. The upper 
and lower bound are then verified to be within a constant fraction of each other. The proof of Theorem 
4 is left to section 7. We then conclude the paper in section 8. 



The modeling of bursty data traffic used in this work builds up on the asynchronism models introduced 
in [1] and [2]. In [1], asynchronism is modeled by having the message transmission block start at a 
randomly chosen time within a prescribed window. The receiver knows the transmission window, but 
not the location of the transmission block. The authors consider an asymptotic regime in which the 
size of the window grows exponentially with the number of bits to be transmitted, and they define the 
communication rate as the ratio between the number of transmitted bits and the average time elapsed 
between the beginning of the transmission block and the time when the decoder makes a decision. 
Under this model, in [1], several aspects of the tradeoff between achievable communication rates and 
the asynchronism exponent were characterized. Later on, in a follow-up work [3], the authors drew 
connections between this asynchronism model and the detection and isolation model introduced in [4]. 

The asynchronism model considered in [2] is very similar to the model from [1]. In [2], however, 
the performance metric is the data rate per unit cost, rather than just the data rate. The authors also allow 
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for the random variable associated with the beginning of the communication block to have more general 
probability distributions (not just the uniform distribution). Their goal is to characterize the maximum 
achievable rate per unit cost, or the capacity per unit cost, which is the inverse of the minimum cost per 
bit. For a discrete channel p(y\x) with input alphabet X and output alphabet y, and an arbitrary cost 
function k : X — > [0, oo], they show that the capacity per unit cost is given by 

C(p = maxmm < — 7— — r, — — — =7 — > , (2) 

^> x \E[k(X)Y E[k(X)](l + /3) J 

where Y* is the random variable corresponding to the output of the channel outside of the transmission 
block (i.e., when the transmitter is idle), and /3 is a parameter that characterizes how the uncertainty of 
the beginning of the transmission block grows with the number of bits to be sent. In particular, for an 
AWGN channel with noise variance Nq, and quadratic cost function k(x) = x 2 , and assuming that the 
beginning of the transmission block is drawn uniformly from {1, 2, 2 l3B }, for ft > 0, where B is the 
number of bits to be transmitted, this expression reduces to 

Notice that, if we define the length of the window to be Ab = 2 /3B , (3) implies that, for an AWGN 
channel, the asynchronous minimum energy per bit is given by 



(l+/3)21n2AT = 1 + H 



l og AB i sync 

where e^ ync = 2 In 2^0 is the usual (synchronous) minimum energy -per-bit of an AWGN channel. In 
addition, the authors of [2] also characterize the basic trade-off between the capacity per unit cost and 
the exponent of the delay within which the decoder must make a decision. 

In [5], the same point-to-point asynchronous model from [1] is considered, but the authors study the 
miss and false alarm error exponents. As a consequence, they are able to characterize the suboptimality 
of tranining-based schemes. 

In [6], a strengthened version of the asynchronism model from [1] is proposed, in which the decoder 
needs to estimate both the message and the location of the codeword exactly. It is shown that the asyn- 
chronous capacity region remains unchanged under this formulation. In addition, the finite blocklength 
regime is investigated. 



3 Problem Setup 

We consider the diamond network, shown in Figure 1. We assume a discrete-time model where, at time 
t, each transmitter node u G {S, 1, 2} transmits a real-valued signal X u [t], each relay i, for i G {1, 2}, 
receives Yj [t] = ^fgiXs [t] + Z\ [t] , and the destination D receives Yd [t] = \fh\X\ [t] + \fh2X2 [t] + 
Zu[t], where Z\[t\, ^[i] and Zf)[t] are independent i.i.d. A/"(0, iVo) noise terms. 

Our bursty traffic model follows the asynchronous communication model introduced in [2]. The 
source receives a i?-bits message m at some random time v G [1 : A], where, for a > b, we define 
[a : b] = {a, a + 1, b}. The source then needs to communicate this message to the destination with a 
delay of at most d time-steps. 

In order to formally define reliable communication, we consider the asymptotic regime of B — > 00. 
Thus, we consider a sequence of arrival distributions {ub}'q =1 , where ub is uniform on [1 : Ab] and 
A B = 2 pB , for B = 1,2,... and some (3 > 0. Notice that, asi? — > 00, B j Ab — > 0, thus capturing 
the idea of short and sporadic messages. Once the B bits arrive at the source at time vb, they must 
be communicated to the destination within a delay ds- Notice that, in order for the problem to be 
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meaningful, ds should be small in comparison to Ab- Otherwise, it would be possible to devise a 
strategy where the source only starts its trasmission at pre-defined time-steps separated by ds time- 
steps, and the traffic would not be actually bursty. Thus, since Ab is exponential in B, we will require 
the delay ds to be subexponential in B. 

An asynchronous code C for the symmetric diamond network is designed to communicate a specific 
number of bits B with a delay of ds, assuming an arrival distribution vb- This code is comprised of 

• an encoding function for the source / : [1 : 2 B ] x [1 : Ab] — > M. dB+1 , which defines the source 
transmit signals for [vb : vb + ds), given the B message bits and their arrival time vb\ 

• relaying functions pij : — > R, which define relay z's transmit signal at time t given its 
received signals in times 1, t — 1, for t = 1, Ab + ds\ 

• a sequential decoder (r, m), which, at time t, decides to either decode the message (in which case 
it sets t = t and outputs a decoded message rh) or to wait (in which case r > t). 

We then have the following definition. 

Definition 1. Energy -per-bit e& is achievable if we can find a sequence of codes {Ck}^ =1 and a sequence 
{Bk}^ = i, with Bk — >■ oo as k — > oo, where code can transmit Bk bits with a maximum delay of ds k , 
assuming the input distribution is vs k , and we have 

1. Hindoo Pr (error(Cfc)) = 

log d Bk 



lim 



o i- • r B[Sc k ] . 

3. hminffc^oo —- — < e b , 
Bk 

where £ Ck = ^=1^" { x s[t] 2 + X^t] 2 + X 2 [t} 2 ) is the total energy used by code C k , A k = 2^, 
and error(Cfc) is the event {m / m} U {r > fB k + ds k } far code C k . The asynchronous minimum 
energy -per-bit is the infimum over all achievable energy-per-bit values. 

Constraint 2 is what characterizes the data as time-sensitive, thus requiring the communication to be 
in fact bursty. Notice that our definition of achievable energy-per-bit is similar to the ones in [2] (with 
delay exponent 5 = 0), and in [7] (by setting j3 = 0, i.e., in the synchronous case). 



4 Preliminary Results 

In this section we first describe some known results for point-to-point AWGN channels. Then we extend 
them in a simple way to the two-relay diamond network, and show that this approach yields upper and 
lower bounds on the asynchronous minimum energy-per-bit which can be arbitrarily far from each other. 

4.1 Point-to-point AWGN Channel 

In the synchronous case, the minimum energy-per-bit of a point-to-point AWGN channel is a special case 
of the inverse of the capacity per unit cost, studied in [8], where the cost is the average power. Consider 
a simple AWGN point-to-point channel, where the channel gain between transmitter and receiver is 
y/h. Let e fe sync be the minimum energy-per-bit of this channel in the synchronous setting and e fe async be 
the minimum energy-per-bit of this channel in the asynchronous setting. The following lemma can be 
obtained from the results in [8]. 
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Lemma 1. IfC(P) is the capacity of the synchronous AWGN channel with power constraint P, then 

e » yDC = S W) = llK 

where we define 7 = 2iVo In 2. 

The importance of Lemma 1 for us is that it guarantees that any energy-per-bit e b > e^ ync can be 
achieved with codes whose delay (i.e., the blocklength) is linear in the number of bits being sent. To see 
this, consider any e b > ef QC = mf P>0 P/C(P). We can find P' > such that e b > P'/C(P'), and, 
for 5 > sufficiently small, we will have e b > C r£i\_§ - Now, for the rate R = C(P') — 5, we can find a 
sequence of (synchronous) codes {Ck}^L v where Ck transmits Bk = kR bits, with a blocklength equal 
to k, whose error probabilities go to as k — > 00. Therefore, the energy-per-bit of each code Ck satisfies 

P'k _ P' _ P' 

~Bk~ ~ ~R ~ C(P') - 5 ~ Cf " 

and the delay is k = Bk/R is linear in Bk- 

Next, we consider the same AWGN channel, but in the asynchronous setting. We will make an 
additional assumption about the sequence of distributions of vg. Let ps{t) = Pr[^s = t] and let 
Pg ax = maxf PB(t). We will require that p^ ax — > as B — > 00. Among the sequences of distributions 
satisfying this property, we have sequences of distributions whose probability mass functions have the 
same "shape" but stretched over the interval [1 : 2^ B ] for each B. In particular, this is the case when 
ub is uniformly distributed on {1, 2 133 }, which will be our focus when we consider the diamond 
network. By using this restriction, we exclude the distributions for which the expression (6) in [2] does 
not evaluate to the normalized entropy. Under this assumption, we can state the following theorem, 
which is similar to the results in [2]. However, since we are not in the discrete alphabet setting which 
is the focus of [2], our achievability scheme is somewhat different, and it introduces the notion of a 
separation-based scheme. We present the achievability proof here, and the converse in Appendix I. 

Theorem 1. For an asynchronous AWGN channel, the minimum energy-per-bit is given by 

e async = (J + ^sync 

where H = liminf^oo H{vB k ) / B^- In particular, if each us k is uniformly distributed in [1 : 2^ Bk ], 
then H = 13. 

Proof. Achievability: We will show that the asynchronous energy-per-bit (1 + ^)e fe sync (l + 5) 2 , for an 
arbitrarily small 5 > 0, is achievable, which implies e fe async < (1 + H)e^ nc . We will let {B^} be the 
subsequence of 1, 2, ... along which lim^oo H(uB k )l 'Bk = H. We will then build a sequence of codes 
{Ck}kLi' wnere code Ck assumes arrival distribution VB k and transmits Bk bits. 

Our scheme is based on having the transmitter send a large pulse as soon as the message arrives. 
The receiver will use a threshold detector to detect the pulse. Once the pulse is (correctly) detected, 
communication can proceed as in a synchronous channel. For code Ck, the total energy available for 
the pulse will be H(vB k )e^ ync (l + 5) 2 = H{fB k )l{^ + S) 2 /h. If the message arrives at time t (which 
implies ps k (t) > 0), then the transmitter will first send a pulse of magnitude 



(l+qy '°g^.Wh =(1 + a)v /-2iVoln( mt ( t )) (4) 

Following the pulse, the transmitter sends a codeword from an optimal code designed to send Bk bits 
with energy-per-bit (l+5) 2 e fe sync = (l-\-5) 2, y/h over the synchronous version of the channel. Therefore, 



6 



the expected energy consumed by our code is given by 



E [S Ck ] = J>s fc (t) ["(I + log (pB k (t)h/h + B k (l + 6) 2 j/h] 



t=i 

r A k 



t=l 



= (l + 5) 2 UH(u Bk ) + B k ), 



where we used the convention that log = 0, in order to sum over all t £ {1, A k }. The energy- 
per-bit we will achieve will be lim^oo E [£c k ] /B k = (1 + <5) 2 (1 + H)j/h, as we intended. All we 
need to show is that the probability of error of our codes goes to as k — > oo. Since we are using an 
optimal code for the synchronous channel to actually communicate the bits, the probability of incorrect 
decoding, given that the pulse was detected goes to as k — > oo. Moreover, from Lemma 1 we know 
that the blocklength required for these codes will be B k /R for some R > 0, which guarantees that the 
decoding delay d Bk = B k /R + 1 will satisfy lim^oo \ogd,B k / B k = 0, and the probability of late 
decoding also goes to as k — > oo. Thus, we only need to show that the probability of error in detecting 
the pulse goes to 0. For this to happen, we will set the detection threshold at the destination to be 



at time t. We define the following two error events: 

• L\ = {Destination does not detect the pulse} 

• Li = {Destination incorrectly detects a pulse before the pulse is sent} 

Clearly, the probability of error in detecting the pulse can be upper bounded by Pr(Li UL2) < Pr(Li) + 
Pr(L2). We will show that each of the terms goes to as k — > 00. For L\, we have 



Pr(Lx) = X>£ fc (t) Pr { (1 + 5)^-2N \n( PBk (t)) + Z(t) < (1 + 5/2)^/ -2N In (p Bk (t)) 



(l + 5/2)^/-2N \n(p Bk (t)) 



t=i 





t=i 




and, as k — > 00, we have p^ x — > and — logpg^ x — > 00, implying that Pr(Li) — >■ 0. For L2, we have 



Pr(L 2 ) < Pr I 3 1 G {1, A k } : Z(t) > (1 + 5/2)^ -2N In (p Bk (t)) 




t=i 



t=i 




t=l t=l 
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which goes to as k — > oo, since p™ x — > as k — > oo. 



□ 



4.2 Two-relay Diamond Network 

We now start considering the two-relay diamond network shown in Figure 1 in the asynchronous setting. 
Unless otherwise noted, we will assume throughout the paper (wlog) that g 2 < 5i • Moreover, we will 
focus in the case where vb is uniformly distributed on [1 : 2^ B \. 

A simple achievable scheme for the two-relay diamond network in Figure 1 is a separation-based 
scheme, similar to the one we used in the achievability of Theorem 1. Thus, we will have a synchroniza- 
tion phase, where the source will send a pulse at time us k to synchronize the relays, and the relays will 
send a pulse at time VB k + 1 to synchronize the destination. After this, provided that the pulses were 
correctly detected by all nodes, we are in a synchronous setting, and we will have a communication 
phase. In this phase, any code for the synchronous two-relay diamond network can be used, as long as 
its delay is subexponential in the number of bits being sent. 

To compute an achievable asynchronous energy-per-bit, we will use decode-and-forward for the 
communication phase. Notice that several relaying schemes that outperform decode-and-forward are 
known [9, 10, 11]. However, there is no closed-form expression for the energy-per-bit achieved by these 
schemes, making it difficult to compare their performance to the lower bound. A careful calculation of 
the asynchronous energy-per-bit achieved by this separation-based scheme yields the following theorem, 
whose proof is in Appendix II. 

Theorem 2. The asynchronous minimum energy-per-bit for the network in Figure 1 satisfies 



In order to obtain lower bounds on the asynchronous minimum energy-per-bit, we will use a tech- 
nique similar in flavor to cut-set bounds, but applied to minimum energy-per-bit. The idea is to consider 
all four cuts in the network in Figure 1, and view it as a MIMO channel, thus being able to apply a lower 
bound to the asynchronous minimum energy-per-bit of a point-to-point channel, as in Theorem 1 . This 
approach yields the following result. 

Theorem 3. The asynchronous minimum energy-per-bit for the network in Figure 1 is lower bounded 
as e™ m > LB, where LB is the optimal solution to 



In order to prove this result, we will require the following two results, which bound the asynchronous 
minimum energy-per-bit of the MIMO channels obtained when we consider the different cuts. Their 
proofs are in Appendices III and IV respectively. 

Lemma 2. Consider the networks in Figures 2(a) and 2(b), where the message arrival time vb is uni- 
formly distributed in [1 : 2 /3B ]. Then, the minimum asynchronous energy-per-bit e™ 111 of these two 
networks is given respectively by 




Maximize 
subject to 



7(1 + P) [yi + V2 + V3 + 54] 

(91 + 52)2/1 + 522/3 + 512/4 < 1 
(hi + h 2 )y 2 + ^153 < 1 
(hi + h 2 )v2 + ^22/4 < 1 
51,52,53,54 > 0. 



(5) 



mm 



(1 + /?) 



7 



and e" 



mm 



(1 + /?) 
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51 +52 



hi + h 2 
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Lemma 3. Consider the MIMO channel in Figure 3 in the asynchronous setting, where the message 

arrival time Vb is uniformly distributed in [1 : 2^ B ]. Consider a sequence of codes {Ck}kL\ that 

(s) 

achieves a finite energy -per-bit, and let be the energy spent by code Ck at the source transmitter S{, 



for i = 1,2. Then, we must have 



E 



a lim inf ■ 

k— »oo 



s, 



(si) 

c k 



E 



B, 



+ b lim inf ■ 

k— too 



s, 



(«) 

c k 



> 7(1 + 



Proof of Theorem 3. We will use the networks in Figures 2(a) and 2(b) to bound the energy-per-bit 
used by the sources, and the energy-per-bit used by the relays respectively. Notice that these networks 
correspond to two out of the four cuts in our diamond network. 




(a) 




(b) 



• D 



Figure 2: MIMO Channels obtained from first and second hops of the diamond network in Figure 1. 



Now, suppose we have a sequence of codes {Ck}^? =l achieving a finite energy-per-bit on the 
diamond network in Figure 1 . Then we consider applying this sequence of codes to the network in Figure 
2(a), where we assume the same asynchronism level. In order to do this we let the source transmit as if 
it were in the network in Figure 1. The destination, which has two receive antennas, which represent the 
relays from the original network, will first compute what the transmit signals of the relays would have 
been in the original network. Then it will simulate the second hop from the relays to the destination, and 
use the same sequential decoder used by the destination in the original network applied to the simulated 
received signal. It is clear that the probability of error of this code applied to the network in Figure 
2(a) is identical to the probability of error of Ck on the network in Figure 1 . The main difference is that 
the energy from the relays is not consumed anymore, and the energy used by code Ck when applied to 

(s) 

the network in Figure 2(a), is just the energy used by the source £q ' . This will allow us to bound the 
energy-per-bit used by the source. From Lemma 2, we have 



E 



lim inf 

k— >oo 



Cfc 



Bi 



>(! + £) 



7 



9i +92 



(6) 



Then we consider applying code Ck to the network in Figure 2(b). This time, the source will simulate 
the transmit signals of the source in Figure 1 and the received signals at the relays. Then it can compute 



the transmit signals of the relays in Figure 1 and use them one for each of its antennas. If we let E c 
and 6^ be the energy used by the relays in code Ck, Lemma 2 tells us that 



in) 



E 



lim inf ■ 

k— »oo 



An) F 



c k 



Bu 



>(! + £) 



7 



hi + h 2 



(7) 



Up to this point, we have only considered two out of the four possible cuts of the two-relay diamond 
network. The other two cuts will yield MIMO channels that look like the network in Figure 3, for a = h\ 
and b = g2, and a = g\ and b = hi- For a point-to-point channel such as the one in Figure 3, it is not 
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Figure 3: MIMO channel obtained from remaining two cuts on the diamond network in Figure 1, by 
setting a = h\ and b = g 2 , or a = g\ and b = h 2 . 



difficult to see that the asynchronous minimum energy-per-bit would be given by 7(1 + (3) min Q, |). 
This is trivially achievable by just using one of the two source antennas (si or s 2 ). However, for the 
purposes of deriving a tighter lower bound, we will be interested in capturing a relationship between the 
energy that is spent in each of the two source antennas. This relationship is stated in Lemma 3. 

Recall that, in order to derive (6) and (7) we used the fact that code Ck can be applied to the networks 
in Figures 2(a) and 2(b). Similarly, by applying code Ck to the network in Figure 3, (with a = h\ and 
b = g 2 , and a = g\ and b = h 2 ), we can use Lemma 3 to obtain 
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Next, we notice that (6), (7), (8) and (9) imply that, for any 5 > 0, there exists a ko such that, for k > ko, 



e. 



> 7(1 + 0), 

> 7(1 + 0). 



(8) 
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(gi +92) 



(s) 



> 7 (l + 0)-<5 
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Ck 
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> 7 (l + P)-5 
>-y(l + P)-6 
> 7 (l + /3)-5. 



Therefore, for any k > ko, a lower bound to 
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Bk 



is the optimal value of the linear program 



Minimize x s + x n + x r - 2 

subject to (gi + g 2 )x s > 7(1 + /3) - 5 

(hi + h 2 )(x ri +x r2 ) > 7(1 + 0) 
g 2 x s + h\x ri > 7(1 + P) - 5 
gix s + h 2 x r2 > 7(1 + 0) - <5 



(10) 



10 



This implies that, for any 5 > 0, the above linear program is also a lower bound to 



liminf E ^ Ck \ (11) 

and, after letting S — > 0, (10) is still a lower bound to (11). Finally, by taking the dual of (10) with 
5 = 0, we conclude that (5) is a lower bound to the asynchronous minimum energy-per-bit of the 
diamond network in Figure 1. The advantage of using the dual linear program in (5) rather than its 
primal is that any feasible solution (yi, 1/2,2/3,2/4) yields a lower bound to the asynchronous minimum 
energy-per-bit of the diamond network. □ 

In order to explicitly compute a gap between the upper and lower bounds, we may consider a worse 
lower bound, obtained from the feasible solution to (5) y\ = (gi + <?2) _1 ,2/2 = (^1 + ^2) _1 ,2/3 = 
0, 7/4 = 0. This tells us that 

LB > 7(1 + /3) + t-^-A ■ (12) 

\9i+92 h l +h 2 J 

The gap between our upper bound and this lower bound is given by 

{i + Ph(--—}—). (13) 

\92 9i +92 J 

This result shows that our separation-based scheme performs well in cases where the channel gains of 
the first hop are much stronger than the channel gains of the second hop (since the gap in (13) is small in 
comparison to the lower bound in (12)). However, it is important to realize that our gap depends on f3, 
suggesting that the separation-based scheme may be arbitrarily bad in high-asynchronism regimes (i.e., 
when j3 is large). Notice that, even if we consider the optimal solution to (5), our lower bound is still a 
multiple of (1 + /3) and the gap to the upper bound from Theorem 3 will still depend on (3. 



5 Main Results 

Our first main result (Theorem 4) is that a relay can only be helpful in a coding scheme (from the energy- 
per-bit point of view) if it is synchronized. From this, we can derive our second main result (Theorem 5), 
which consists of a lower bound for the asynchronous minimum energy per bit of the diamond network 
(tighter than the one in Theorem 3), whose ratio to the upper bound in Theorem 2 is bounded by 2, and 
decreases to 1 as f3 increases. The proof of Theorem 4 is very technical, and is deferred to section 7, 
while the proof of Theorem 5 is presented in this section. 
In this work, we will define synchronization as follows. 

Definition 2. Relay i is synchronized in the sequence of codes {Ck}^ =1 if 

lim v kl * ' = 0, 

where Y i k is the vector of received signals of relay ifrom time 1 to time = + ds k , when using 
code Ck- 

Our first main result states that it is optimal (in terms of minimum energy-per-bit) to consider only 
schemes where we either use both relays and synchronize them, or we just use relay 1 and synchronize 
it. We rule out the case where only relay 2 is synchronized because, since #2 < 9i, we have the Markov 
chain vs k « Yi H Y^. Thus, we must have H(vB k \Y^ k ) < H(uB k \Y^ k ), which implies that if relay 
2 is synchronized, so is relay 1. 
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Theorem 4. Suppose we have a sequence of codes {Ck}^Li achieving a finite energy-per-bit on the 
asynchronous diamond network in Figure 1. Then we can achieve arbitrarily close to the energy-per-bit 
eft with a sequence of codes {C' k }for which one of the following is true: 

(a) Relay i, for i = 1,2, can create a list A^ C [1 : Ak] based on their received signals, such that 
v Bk G A-jfe"'^ with vanishing error probability and list size |Aj^| sub exponential in Bk 

(b) Relay 1 can create a list A^ C [1 : Ak] based on its received signals, such that v Bk £ A^* 1 -* 

with vanishing error probability and list size \A k ri ^\ subexponential in Bk and relay 2 is inactive 
(i.e., does not transmit any signal) 

Theorem 4 states that we can assume wlog that any sequence of codes {Ck}^ =1 achieving a finite 
energy-per-bit will allow any relay that is used (i.e., any relay that does not stay silent) to create a list 
A k n) c [1 : A k ] that has size |A^ r ^| that is subexponential in Bk and contains v Bh with vanishing error 
probability. Therefore, if relay i is used in the sequence of codes {Cfc}^, and if we let t{v Bk G AjjT'^} 

(r) 

be an indicator function for the event u Bk £ A y k , then we must have 

H(u Bk \Y^) ^ H{v Bk \A^) ^ H(u Bk ,l{u Bk € A^}|A^) 
Bk Bk Bk 

^ l + H(u Bk \A^\t{u Bk eA^}) 
Bk 

Bk 

^ l + loglA^I+^Pr^^A^) ^ 
Bk 

which goes to as k — > oo, because |Aj^| is subexponential in Bk, and Pi(v Bk ^ — > as 
k — > oo. Thus, we have just shown the following. 

Corollary 1. It is possible to achieve the minimum energy-per-bit of the asynchronous diamond network 
in Figure 1 with codes where each relay is either synchronized or remains silent. 

In the remainder of this section, we show how Theorem 4 can be used to improve our lower bound, 
and, in section 7, we prove Theorem 4. We will need some facts related to the capacity of a two- 
user degraded broadcast channel. Let C(P) be the capacity region of a degraded broadcast channel 
IhYiH Y2. We know that this capacity region consists of all pairs R2) such that 

R\ < I{X; Yi\U) 
R2<I(U;Y 2 ), 

for some distribution p(u, x) such that E [|| A|| 2 ] < P, where R2 corresponds to the common rate, and 
R\ to the private rate to the stronger user. However, we will be interested in the multi-letter characteri- 
zation of the same region; i.e., all pairs (R\, R2) such that 

R 1 < ±I(X n ;Y{ l \U) 

R 2 <±I(U;Y 2 n ), (14) 

for some n and some distribution p(u, x n ) such that E [|| A™|| 2 ] < nP. An important quantity for us 
will be the (1 : 7) -capacity of this broadcast channel, which we define as 

Ci :7 (P) = maxjP : (R,~fR) G C(P)}, 



12 



for some 7 > 0. Using the multi-letter description of the capacity (14), it is easy to see that we have 

- I{X n ;Y?\U) I(U;Y?) 



Ci :7 (P) = sup min 

n,p(u,x n ):E\\X n \\ 2 <nP 



Tl 



(15) 



Now we can state our new lower bound. 

Theorem 5. The asynchronous minimum energy -per-hit for the network in Figure I is lower hounded 



us 



>min<jLB2,7(l + /3) ( — + i 
\9i hi 



where LB2 is the optimal solution to 
Maximize 7 



,91 



Vl ( 1 + + ) + (1 + P)(y2 + V3 + 2/4) 



subject to (gi + g 2 )yi + 522/3 + 512/4 < 1 
(hi + h 2 )y 2 + hiy 3 < 1 
(hi + h 2 )u2 + h 2 yA < 1 
2/1,2/2,2/3,2/4 > 0. 



(16) 



Proof. First we assume that {C/ c }^ =1 falls into case (a) of Theorem 4, and both relays are synchronized. 
In this case, we will consider using code on the degraded broadcast channel in Figure 4, in the 
synchronous setting. Notice that, for this broadcast channel, Y 2 = Jg~2~X + Z 2 is a scalar, while Y\ = 




Figure 4: Degraded broadcast channel used in the proof of Theorem 5. 









Yl,2 







is a vector. When we consider using code on this channel in the synchronous 

setting, we will have the source choosing an arrival time vs k uniformly at random from [1 : 2 l3Bk ] and 
transmitting the message as if we were in the asynchronous setting. Notice that destination D\ can 
simulate what the relays would have done in the diamond network, thus being able to simulate what 
the destination from the diamond network would have received. This guarantees that, with probability 
1 — 6k, where — > 0, destination Di can decode m correctly and output a list [r — ds k : r] containing 
v Bk - 

We will let X Ak be the the random vector corresponding to the transmit signals of the source when 

using code on the broadcast channel, and Y x k and Y^ k be the corresponding outputs at Di and D 2 
respectively. Since we are assuming that relay 2 is synchronized in the diamond network, destination 
D 2 will be synchronized here, which implies that 



lim 

k— >oo 



Hiv B . 



(17) 
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Now, using (15) with U = i/g fe , we obtain 
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1 


-=- min 




A k 




1 




min 




Afc 


(0 
> 


1 


min 




^4fe 



/(x^ ; y/ fc |^j,/(^ fc ;y 2 ^)/^ 

B k - H(X A * \Yf* ,» Bk ),B k -H (u Bk 
B k - (H(e k ) + e k B k ) ,B k -H (u Bk \y 2 A " ) //3 



y A k 



IP 



(18) 



where (i) follows from Fano's inequality. Next, we notice that the capacity C(P) of the Gaussian 
degraded broadcast channel is known in closed- form, and in the case of Figure 4, it is comprised of all 
non-negative pairs , i?2 ) satisfying 

Ri < g log (1 + a{ gi + g 2 )P/N ) 



R2 < x log 1 + 



(1 ~ a)g 2 P 
N + ag 2 Pj ' 

for some a € [0, 1]. It is then not difficult to see that the (1 : /3)-capacity of our broadcast channel can 
be expressed as 



Ci-b(P) = max min 

0<a<l 

Then we have 

sup < max mm 

p>o P 0<a<l 



^log (1 + „<* + S2 )P/N 0} . ± ,og (l + j 



= max mm 

0<a<l 



log(l + a( 5l + 52 )P/AT ) 
sup — , sup 

P>0 ^-i 

+ 92) (1 - a)0 2 



P>0 



log [ 1 + ( 1 ~ Q )g 2P 

1 1 L ^ N +ag 2 P 

2/3P 



7 



07 



52(51 +#2) 



7 [P(gi + 52) + 52] ' 

Now, by combining (18) and (19), we conclude that 
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92(91 + 92) 



> sup 



7 \P(gi +92) + 92] p>o p 
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/A k 
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Finally, by taking the limsup when k — > 00 and using (17), we obtain 



92(91 + 92) 



> lim sup 



i[(3(gi + 92) +52] fc-^oc e 



Bk 



Ck 



lim inf E 

k— >oo 



/R >. 7 [fi (51 + 52) +52] 

/ - ? 1 \ = 7 

52(51+52) 



52 5i + 52 



(20) 
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For the other three cuts in the network, we use the same analysis that we used in the proof of Theorem 3 
to obtain (7), (8) and (9). Then, by following very similar steps to those in the proof of Theorem 3, we 
conclude that a lower bound to 



lim inf 



E[£c k ] 

is given by the optimal solution to the linear program 

Minimize x s + x ri + x r , 2 
subject to (gi + g 2 )x s > 7 I + (3 



9i + 92 



92 

{h 1 + h 2 )(x ri +x r2 ) > 7(1 + 0) 
g 2 x s + hix ri > 7(1 + f3) 
gix s + h 2 x r2 > 7(1 + (3) 

x s ,x ri ,x T 2 ^ 0. (21) 

Then, by taking the dual of (21) we obtain (16), which concludes the proof in the case where both relays 
are synchronized. 

If the sequence of codes falls into case (b) of Theorem 4, we may assume that only relay 1 is syn- 
chronized an relay 2 is silent. Then the analysis is much simpler. We essentially have two concatenated 
point-to-point asynchronous AWGN channels, in which case the asynchronous minimum energy-per-bit 
is exactly given by 

•f- Mi +/»>(£ + £). <m 

and the theorem follows. □ 



6 Implications of the Main Results 

The result in Theorem 5 allows us to characterize the asynchronous minimum energy-per-bit of the 
diamond network to within a constant ratio, which ranges from 2 in the synchronous case to 1 in the 
highly asynchronous case. First notice that, if we just use relay 1 (the stronger relay), we can achieve 

energy-per-bit 7(1 + /3) ^ + jj^J. Therefore, in cases where 

7(1+,?) (^ + £>- LB2 ' <23) 

the optimal strategy for the two-relay diamond network is to just use relay 1. In these cases, there is no 
gap between upper and lower bound. In cases where 

7(1+,?) (^ + £> >LB2 ' (24) 

it is clear that LB2 is a lower bound on the asynchronous minimum energy-per-bit for any sequence of 
codes (independent of which relays are synchronized). Therefore, if we compare it to the simple upper 
bound from Theorem 2, and using the feasible solution to (16) y\ = (gi+fi^) -1 , 92 = (hi + h 2 )~ 1 ,y3 = 
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0, 2/4 = 0, the gap will satisfy 



(i + /%(- + 7— hr 

\g 2 hi + h 2 
<(l + /3) 7 fi + 



LB 2 



1 



7 



1 

52 



.92 
1 



h\ + /i 2 



7 



52 51 +52 /il + ^2 



51 +52 



Notice that this gap does not depend on /? anymore. Therefore, we conclude that the separation-based 
scheme, although suboptimal, has a performance that does not become worse for large f3; it in fact 
becomes relatively better. An important outcome of these results is that, for some values of 51,52, hi 
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Figure 5: Optimal relay selection for a fixed position of source, destination and relay 1, and a varying 
position of relay 2. We show in green the positions of relay 2 for which it would be optimal to use both 
relays, in red the region where it would be optimal to use only relay 1, in blue the region where it would 
be optimal to use only relay 2, and in yellow the region for which our result does not provide an answer. 
The channel gains are assumed to be inversely proportional to the cube of the distance. 

and h2, we can decide whether it is optimal to use one or both relays. As we observed before, in the 
cases where (23) holds, it is optimal to only use relay 1. The intuition is that the cost of using relay 2 is 
high, since it must be synchronized in order to be useful, and using it does not improve the achievable 
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energy-per-bit. On the other hand, in cases where our upper bound is below the lower bound when using 
only relay 1, i.e., when 

I + i>I + * (25, 
91 hi g 2 hi + ti2 

we know that the optimal strategy involves using both relays, and paying the price for synchronizing 
them. The plots in Figure 5 illustrate these results. For a given value of j3 and for fixed positions of 
the source, relay 1 and the destination, we show in green the positions of relay 2 for which it would be 
optimal to use both relays, in red the region where it would be optimal to use only relay 1, in blue the 
region where it would be optimal to use only relay 2, and in yellow the region for which our result does 
not provide an answer. To create these plots, we assumed that the channel gains are proportional to the 
cube of the inverse of the distance. 

Next, we consider the ratio upper-bound/lower-bound. The upper-bound that we use is simply the 
minimum between a decode-and-forward scheme using only relay 1 and a decode-and-forward scheme 
using both relays, i.e., 

M^^k + h) Mi+m (^ + -^)Y (26) 

As shown in [12], in the special case where the first hop of the diamond network is symmetric, i.e., when 
9i = 92 = 9, the upper and lower bounds are within a constant factor of each other. To see this, notice 
that, if 91 = 92 = g, the upper-bound always reduces to 7(1 + f3)(l/g + l/(/ii + ^2))- Moreover, by 
computing the bound (16) with yi = (2g)~ 1 ,y 2 = (hi + h 2 )~ 1 ,yz = 0, j/4 = 0, we obtain the lower 
bound of 7(0/3 + 1/(25) + (1 + P)/(hi + h 2 )). We then have 

^j + ^G + ro) (l + fl| = l + £ 



' \g ^ 2g + h!+h 2 ) 9 2 S 



Therefore, if gi = Q2, separation-based schemes achieve to within a factor of (1 + (5)/(\ + /?) from the 
minimum energy-per-bit. This ratio equals 2 when (3 = (i.e., in the synchronous case) but it decreases 
towards 1 as (3 increases. 

In the general case, however, finding a good analytical bound on the worst-case ratio between the 
upper and lower bounds is not as easy. As we noticed before, if (23) holds, then the gap between upper 
and lower bound is zero, and the ratio is one. Therefore, we may assume that, for the worst-case ratio, 
(24) holds, and by plugging yi = (gi + g 2 )~ 1 ,V2 = (hi + h 2 )~ 1 ,y3 = 0, t/ 4 = into (16), we have the 
lower bound 

f/3 1 1 + /3 

7 — + ; + 



K 92 51+52 hi + h 2/ 
Then, an upper bound to the worst-case ratio is 



7 \92 + 91+92 + h x +h 2 ) 92 91+92 



92 



This clearly shows that, as (3 — > oo, the worst-case ratio tends to 1. However, this bound tends to infinity 
when (3 — > 0. To verify that this is not the case for the worst-case ratio, we consider two regimes. 

(i) hi < g2- By considering only the second term in (26), and the lower bound provided by plugging 

1 1 / 1 1 \ 1 

yi =0,y 2 = , ■ r ,ys = - mm — , — = — ,y 4 = 
2(hi + h 2 ) 2 \hi g 2 J 2g 2 
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into (16), we can upper bound the worst-case ratio as 



7(1 + + ^ 



7(1 + P) 
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232 + 2(h!+/i 2 ) 



+/l2 
1 



(ii) hi > g 2 . By considering only the first term in (26), and the lower bound provided by plugging 

hi-g 2 . ( 1 n 1 



. s , 2/2 = 0, y 3 = min . , 
hi{gi+g 2 ) \hi g 2 



hi 



,y 4 = 



into (16), we can upper bound the worst-case ratio as 



7(1+0(^7 + ^ 



91 J hi (91+92) 



+ (!+£)£ 



< 



l+ g i)h 1+ g 1+ g2 



.91 



- 1 - + - 1 - 
91 fa 



h\-g2 1 J_ 
^1(91+92) fa 



Til + 5i 



^ + ^ 

fei+9l 
^l (91 +92) 



1 + — < 2. 



51 



We conclude that, in the worst case, the upper bound in (26) and the lower bound from Theorem 5 are 
within a factor of 2 of each other, and that this factor goes to 1 as ft — > 00. Since this bound is very 
crude, we considered finding the approximate worst-case ratio between the upper bound in (26) and the 
lower bound from Theorem 5 numerically. First we notice that for any choice of 51, 52, hi, /12, if we 
normalize all the channel gains by max(gi, g 2 , hi, h 2 ) we obtain the same ratio between upper bound 
and lower bound. Therefore, we may restrict our search for the worst-case ratio to the case where all 
channel gains lie in [0, 1]. Thus, we considered the ratio between upper bound and lower bound for 
5ii 52 5 hi,h 2 G {1/30, 2/30, 30/30}, and found the worst-case for several values of (3. We obtained 
the plot in Figure 6. This plot confirms that the worst-case ratio is uniformly upper bounded by 2, 

2.2 



Numerical 
worst-case ratio 

Worst-case ratio 



when g 1 



Outer bound for 
worst-case ratio 




0.5 



1.5 



2.5 



Figure 6: Worst-case upper bound to lower bound ratio. 

and decreases to 1 as /3 increases. The ratio decreases to 1 faster than 1 + 1//3, but not as fast as 
(1 + /3)/(l/2 + 13) (the case where gi = g 2 ). 
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7 Proof of Theorem 4 



Our main objective in this section is to prove Theorem 4. The main idea is to show that, in order to 
achieve the minimum energy-per-bit on the network in Figure 1, there is no point in using a relay if it is 
not synchronized, where the notion of a synchronized relay is formalized in Definition 2. 

Recall that, in our definition of what it means for a sequence of codes to achieve an energy-per- 
bit e b (Definition 1), we require that code which operates on a channel with arrival distribution 
UB k , transmits Bk bits. In this section, it will be useful to use an equivalent definition of achievable 
asynchronous energy-per-bit e b . Under the assumption that the distribution vs k is uniform over [1 : 
, we obtain the following result, whose proof is in Appendix V. 

Lemma 4. Suppose we have a sequence of codes {C / t}^ =1 , where code Ck operates on a channel with 
uniform arrival distribution on [1 : 2@ Bk ] but only transmits Bk — f{Bk) bits, with /(■) > and 

lim = o. (27) 

k— >oo Bfc 

Suppose that, in addition, this sequence of codes satisfies the following. : 

• Hindoo Pr (error(Cfc)) = 

r log d B k _ n 

• Hindoo Bk -f(B k ) - u 

• liminffc^oo Bk ~f(B k ) ^ e b 

Then, for any rj > 0, this sequence can be used to construct a new sequence of codes {C'^^L^ where 
code C' k operates on a channel with uniform arrivals on [1 : 2@ B k] and transmits B' k bits, satisfying 

• lmifc^oo Pr (error(C£,)) = 

\ogd' B 

• Hindoo B , k = 

E[£ c , } 

• liminffe^oo — ^ < (1 + rj)e b , 

k 

i.e., {C' k } achieves an energy-per-bit (1 + r\)e\, according to the original definition. 

This Lemma allows us to regard the three conditions satisfied by the sequence of codes {Cfc}^ x in 
the statement of Lemma 4 as an equivalent definition of what it means for a sequence of codes to achieve 
energy-per-bit e b . 

In order to prove Theorem 4, we will start with a sequence of asynchronous codes {Ck} achieving 
energy-per-bit e b , and we will make several modifications to it, until we obtain another sequence of codes 
with the properties stated in the statement of the theorem. These steps and the lemmas and theorems that 
construct the proof are summarized in the diagram in Figure 7. 

Our first goal is to convert any given scheme into another scheme where the transmissions by the 
source are restricted to start at a few special "transmission times", and last at most t time steps. Then, if 
each pair of consecutive transmission times are separated by more than I time steps, at a given time t, if 
the source is transmitting, there is only one possible starting time for the transmission block. Intuitively, 
this will facilitate the relays' task. We formally define this notion as follows. 

Definition 3. An asynchronous code Ck is said to have non-overlaping transmission blocks if there is 
a set of times {ti, t2, t q } with t\ < ti < ... < t q and a transmission block length Ik, satisfying the 
following properties: 
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Sequence of codes {C k } achieving energy-per-bit e h 




Lemma 5 



Sequence of codes {C t } with non-overlapping transmission 
blocks achieving causal energy-per-bit **e 6 




Lemma 6 



Sequence of codes {C k } with non-overlapping transmission blocks 
achieving causal energy-per-bit » e b uniformly over the messages 




Lemma 7: For almost all arrival times, 
the probability that relay 1 uses little 
energy in the correct block is small 



Theorem 6: Relay 1 can create list 
containing v B with high probability 



Lemma 8: Either the probability that 
relay 2 uses little energy in the correct 
block is small for almost all arrival 
times, or it is not small for almost all 
arrival times 



Theorem 4: Either both relays can 
create lists containing v s t with high 
probability or relay 1 can create list 
containing v B[ with high probability 
while relay 2 stays silent 



Figure 7 : Main steps in the proof of Theorem 4, and informal descriptions of main lemmas and theorems. 



ti+i -U> 4 + l, fori = 1, ...,q- 1 

The interval [1 : Ak] is divided into q subintervals Jj 



(»-l)^fc | i . iA k 
q ~ ' q 



1, q (dis- 



regarding edge effects), such that t{ > and if the message arrives at time vs k £ h, then 
the source keeps it buffered until the start of the block [ti : t» + 1}~ — 1], where the transmissions 
occur. Moreover, given that the message arrived in Ij, the signals transmitted by the source during 
[ti : ti + Ik — 1] are only a function of the message value, and not the actual arrival time. 

• At a time t G [ti : ti + £/- — 1], the relays only need the signals received in [ti : t] to compute 
their relaying functions, and the destination only needs the signals received in [ti : t] to apply its 
detecting/decoding functions. 1ft ^ [ti : ti + l\. — l]for i = 1,2, q, then the relays stay silent, 
and the destination does not apply any detection/decoding function. 

We will in general refer to the set of transmission times of code Ck as Sk- Notice that a non- 
overlapping transmission blocks scheme effectively induces a new message arrival distribution z>g fe , 
where Pr(i>s fe = t) = l/\Sk\ if t € Sk and Pi(i>B k = t) = otherwise. Then we have the following 
key result, whose proof is in Appendix VI. 

Lemma 5. Suppose we have a sequence of codes {Cfc}^ =1 achieving a finite energy-per-bit e\, on the 
asynchronous diamond network in Figure I. Then we can build another sequence of codes {C' k } with 
delay constraint d' Bk sub exponential in B' k = Bk, with non-overlapping transmission blocks of length 
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Ik, far which 



lim inf 



E £ c , k [l:i> Bk +£ k -l} 



- < (l + r?)e fe , 



Bk 



for any arbitrarily small 77 > 0, and whose probability of error goes to as k — > oo. 

Remark 1: The statement of Lemma 5 does not require the sequence of codes {C' k } to in fact achieve 
any energy-per-bit, according to Definition 1 . 

Remark 2: Notice that the energy spent in the code C' k after time v Bk + ^fc — 1 is only spent by the 
relays. Therefore, if somehow the relays were able to decode i>B k at time i>B k + Ik — 1, they could 
stop transmitting, and Theorem 5 would imply that we have a sequence of codes with non-overlapping 
transmission blocks achieving arbitrarily close to energy-per-bit e b . 

We will now move toward our main result for the 2-relay diamond network. Since we will be 
frequently dealing with the sequence of codes constructed via Lemma 5, the following definition will be 
useful. 

Definition 4. A sequence of codes {Cfe}^ =1 with non-overlapping transmission blocks achieves a causal 
energy-per-bit e b , if it satisfies properties 1 and 2 in Definition 1, and 



where Ik is the transmission block length. 

Remark: Consider a sequence of codes with non-overlapping transmission blocks that achieves a causal 
energy-per-bit e b with delay ds k (which must be, according to Definition 1, subexponential in Bk). 
Notice that a message that arrives in the first half of ij, for any i, cannot be decoded with a delay 
smaller than \Ii\/2 = Ak/(2\Sk\ )- Since the message arrives in the first half of some l, L with probability 
1/2, and the error probability goes to 0, Definition 4 implicitly requires that ds k > Ak/(2\Sk\), for 
k sufficiently large. Therefore, since the delay dB k must be subexponential in Bk, we must also have 
Ak/\Sk\ subexponential in B k . This fact will be used in subsequent proofs. 

Lemma 5 states that we can take any sequence of codes {Cfc}^ achieving energy-per-bit e& and 
use it to build another sequence of codes {C' k } which achieves a causal energy-per-bit arbitrarily close to 
e b and has non-overlapping transmission blocks. Our first goal will be to show that any such sequence 
of codes can be converted into yet another sequence of codes, achieving the same causal energy-per-bit, 
where either both relays decode i>B k exactly, or relay 1 decodes i>B k exactly and relay 2 is not used at all. 
This way, the relays that are actually used for communication can decode i>B k and stop transmitting at 
time &B k +£ k — 1- This will allow us to convert our sequence of codes that achieve causal energy-per-bit 
e b , to a sequence of codes that in fact achieves energy-per-bit e b . In addition, this new sequence will 
have the property that any relay that is used must be synchronized. 

In the process of proving Theorem 4, an important step will be to restrict the set of messages that can 
be sent by a given code to only those with some special properties. Because of that, it will be interesting 
that the energy spent by the code does not vary too much depending on the message that is sent. This will 
allow us to restrict a code to only sending a certain subset of the messages without having the average 
energy-per-bit change much. 

Consider a code Ck with non-overlapping transmission blocks. Now, suppose that for each transmis- 
sion time ti we have an injective mapping 4>k,i '■ {1, Mk} — > {1, 2 Bk }. We will let (f>k represent 
the ensemble of all these mappings, i.e., <pk = {4>k,i,<Pk,2, ■■■■,<f>k,\S k \}- Then we have the following 
definition. 



lim inf 



[8 Ck [l:i>B k +£k-l]] 
B k 



< e b , 
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Definition 5. The restriction of code C k according to cf) k , denoted by Ct, is a new code with message set 
{1, M k }, which, given that message m £ {1, M k } (effectively) arrives at time VB k = U> transmits 
the message (f)k,i(m) using code C k . The destination applies the same decoder of code C k , and then uses 
4>~^\ to decode m (declaring an error if4> k ] does not map to any element in {1, M k }). 

We will be interested in codes C k for which any restriction yields a good code, as defined next. 

Definition 6. A sequence of codes {C k } k x L 1 with non-overlapping transmission blocks that achieves a 
causal energy -per-bit e& is said to achieve a causal energy-per-bit e& uniformly over the messages if for 
any sequence of message restrictions j^} we have 



E 

lim inf 



£ c $ [1 : VB k +4-1] 



< e b . (28) 



k^oo log B k 

Then we have the following Lemma, whose proof is in Appendix VII. 

Lemma 6. Suppose we have a sequence of codes {C k } k x L 1 with non-overlapping transmission blocks 
achieving a causal energy-per-bit e& on the asynchronous diamond network in Figure 1. Then we can 
have a sequence of codes {C' k } that have non-overlapping transmission blocks, achieving a causal 
energy-per-bit (1 + 77) e^, uniformly over the messages, for any r] > 0. 

In order to show our main result, i.e., that any relay that is used may be assumed to be synchronized, 
we first focus on relay 1, which is the relay that has a stronger channel from the source. 

Theorem 6. For any sequence of codes {C k } k K ! =1 for the asynchronous diamond network with non- 
overlapping transmission blocks achieving a causal energy-per-bit e& uniformly over the messages, relay 
1 can create a list A k C S k which contains vb u with vanishing error probability, and has a size |A^| 
that is subexponential in B k . Moreover, each t in the list is added to it no later than at time t + l k — 1. 

Proof. Let £^\W(vB k )\ be the energy used by relay i in the transmission block W(uB k ) = [vB k '■ 
v B k + t-k — l]> when using code Ck- Consider a sequence of non-negative numbers {e^} for which 
ek — > 0. Let the set T(a, Ck, e^) of arrival times be defined as 



T(a,C k ,e k ) = it e S k : Pr I ^ — < a and ^ — < a 



VB k = t\ < e k 



(29) 

The proof of the following Lemma is in Appendix VIII. 

Lemma 7. There exists an a > and a non-negative sequence {e k } with e k — > 0, such that 

limsupPr(P Bfc G T(a,C k ,e k )) = 1. (30) 

Notice that, for the a > and the sequence {e/J provided by Lemma 7, we can actually replace 
the lim sup in (30) with a limit, since one can consider the subsequence of {Cfe}^ =1 for which the limit 
exists and is the lim sup of the original sequence. Therefore, we will assume that (30) holds with lim sup 
replaced by lim. 

We will show that, if (30) holds, relay 1 can implement a list detector Afc for VB k with probability 
of error going to 0, and whose list size is subexponential in B k . But first we describe a scheme in 
which each relay i,i = l,2, implements a list decoder A^ j for vs k based on its transmit signals Xi(t), 
t = 1, A k + ds k — 1, and we show that the probability that both decoders make an error at the same 

time goes to 0. The list decoder A ky i for VB k selects the first N k = E ^ e ° k ^ lM ^k +tk 1 l] transmission 
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blocks where the energy consumed by relay i is at least aB k , and lists the corresponding transmission 
times. Let £q 1 \i : v Bk + 4 — 1] be the total energy consumed by relay i up to time v Bk + 4 — 1- 
Notice that, if 



[1 : »B k + 4 - 1] < B k E [£ Ck [1 : v Bk + 4 - 1]] and 
£^[W(i> Bk )]>B k a, 



(31) 
(32) 



then 



> 



= N k 1 . Moreover, there can be at most N k transmission 



4? I 1:i> B fc +/*-!] B R fc [l^s fc 



.-1 



blocks VF(tj) in [1 : z>e fe + 4 — 1] satisfying 



4?V(«i)] 



> N, 1 , which implies that, if (31) and 



(32) are satisfied, the list decoder will be correct, i.e., v Bk G A&. The probability of error at both 
decoders is then given by 

Pr(error at A M and A fcj2 ) = Pr(i> Bk (£ A fejl and P Bfc i A fcj2 ) 

< [W(*bJ] < <*£ fe U 4 r fc l} [1 : i> Bfe + 4 - 1] > ^ [£ Ck [1 : /> Bfc + 4 - 1]]) 

n (s^[W(i>B k )] < aB k U 4 fc 2) [l : + 4 - 1] > B k E [£ Ck [l : * Bfc + 4 - 1]]) 

< Pr [(^[W^bJ] < n^^(5 Bt )] < «S fc ) 

U : + 4 - 1] > 5 fcJ E [fcjl : *B fc + 4 - 1]]) 

U (4l 2) [l : + 4 - 1] > B k E [£ Ck [l : + 4 - 1]]) 

< (^[^bJ] < <xB k nS™[W(i> Bk )] < aB k ) 

+ 2 Pr (£ Cfc [1 : VB k + 4 - 1] > B k E [£ Ck [1 : v Bk + 4 - 1]]) 

( < Pr [£^ l] [W(D Bk )} < aB k r\£ { c k 2) [W(D Bk )\ < aB k | v Bk G r(a,C fc ,e fc )) 
+ Pr(P Bfc £T(a,C k ,e k )) + 2/B k 

(m) 

< 2/B fc + e fc + Pr T{a,C k ,e k )), 



(33) 



where (i) follows by noticint that if we have four events Ai,A2, B\ and B2, then (A\ U Bi) fl (A2 U B2) 
implies (A\ n ^4 2 ) U5iU B 2 , (m) follows from Markov's inequality and (in) follows from the fact that 

Pr {4 k l] [W(D Bk )\ < a B k n4 k 2) [W(i> Bk )] < aB k | v Bk G T(a,C k ,e k )) 
= ^2 P r ipB k = t \v Bk G T(a, Cfc, e k ) ) 

teT(a,C k ,e k ) 

Pr (4l°[W(i?B t )]/B* < a and4; 2) [W(^J]/B fe < a 



Therefore, since Pr [y Bk ^ T(a, e k )) — >■ 0, we have that Pr(error at A^ and A^ j2 ) — > 0. 

Next, we want to use a similar argument to show that relay 1 can implement by itself a list detector 
with a list size linear in B k and probability of error going to as k — >■ 00. In order to do that, notice that, 
since the channel gain from source to relay 1, g\, is stronger than the channel gain from source to relay 
2, g2, relay 1 can "virtually" simulate the received signal of relay 2, and then simulate the output of the 
relaying functions of relay 2, thus being able to implement the list decoder based on the transmit signals 
of relay 2 as well. To simulate the received signal at relay 2, relay 1 multiplies its received signal by 
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a/<72 / <?i and then adds a Gaussian noise with variance 1 — 52 1 9i to it. It is easy to see that the resulting 
signal has the same marginal statistics as the signal received at relay 2. 

Assume that X2 (t) is the signal that would be transmitted from relay 2 at time t according to relay 
l's simulation. Relay 1 can use the list decoder = A^iUAfc^, where A^ is the list decoder based on 
X2{t),t = 1, Ak+d,B k — 1. Notice that A2 has the same distribution as X2, but the joint distributions 
of (X\, X2) and (X\, X2) are different, which is why the previous argument does not work to show that 
the error probability of this list decoder goes to 0. In particular, if we let £^ [W(uB k )] be the energy 

used in the simulated signal X2 in the window W(i>B k ) = [&B k , ^B k + ds k — 1], we cannot say that if 
t e T{a,C k ,e k ) then 



4 k \ W ^B k )] Q r k '[W(v Bk )] 

— < a and Lk „ — — < a 



7(ra)i 



Pr 



VB k = t) < e k . 



\ B k B k 
To solve this problem, we start by noticing that we can write, for t £ T(a, C k , e k ), 



4 r k l] [W(9B k )] A £^'[W(u Bk )] 

— < a and — - — < a 



'(T2)[ 



e k > Pr 



Bi 



(0 



2 M k 



n £ 2 " B *Pr 



' £ ( c:\w(D Bk )\ £J^W(DbJ\ 

— < a and — - — < a 



(a) 



m=l 

2 B k 



Bi 



m £ 2 " B *Pr 



m=l 



B k 



< a 



Bi. 



i>B k = t, m is sent 



vb u = t, m is sent 



(34) 



4 r M(vB k )] 



x Pr 



2 a k 



> V 2~ Bfc mi: 
~ 1 ie{i 

m=l 



mm 

2} 



Pr 



< a 



i>B k = t, m is sent 



-Bfc 



< a 



= i, m is sent 



(35) 



where (i) follows from the independence of VB k and m, and (ii) follows from the fact that, given VB k 
and m, X\ and X2 are independent. Now we notice that 



Pr 



< a 



— - — < a and — - — 

B k B k 

2 B k / c (ri)n xr ,~_ M c(r 2 ) 



VB k = t 



- < a and — 5 — < a 



m=l 

2 B k 



B k 



Bi 



VB k = t, m is sent 



< 



V T Bk mi 

je{i 

m=l 



mm 

2} 



Pr 



4:\w(D Bk )] 

Bu 



< a 



i>B k = t, m is sent 



(36) 



From the Cauchy-Schwarz inequality, for any numbers a\, om, we have 



1 " 



1 M 



1 \ i=i 
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Thus, we can combine (35) and (36) to obtain 



Pr I — - — < a and — - — < a 



i(r 2 )r 



Bk 



v B k 



< 



2 B k 



2~ B k min 



Pr I — - — < q 

Bi. 



v B k 



t,mi& sent 



(37) 



Now, using (37), it is possible to repeat the same steps we used in (33) to obtain 
Pr(P Bfc i A M UA fc , 2 ) < 2/B k + sJT k + Pr (v Bk $ T(a,C k ,e k )), 



and we conclude that Pr(pB fc ^ A^i U Afc )2 ) — > 0. This implies that the list decoder implemented by 
relay 1 alone, which has list size 2d,B k E[£c k ]/a (which is subexponential in B k ), contains v Bk with 
vanishing error probability. □ 

Notice that the result above implies that, in the case where g\ = §2, both relays can implement the 
list decoders for i>B k , and each of them will have vanishing error probability. 

In Theorem 6, we learned that in any scheme that achieves a finite causal energy-per-bit, relay 1 can 
approximately decode vs k with vanishing error probability. Next we address relay 2, the weaker relay. 
Similar to what we did in Theorem 6, we will define the set of arrival times 



75 (a, C k , e k ) = < U e S k : Pr I — — < a 



i>B k =ti\ <e k 



(38) 



where S k is the set of transmission times of code C k and W(i>B k ) = [^B k '■ ^B k + ?k — 1] is tne 
transmission block associated to i>B k - As in Theorem 6, where we used Lemma 7 to characterize the 
asymptotic behavior of Pr[z>e fc G T(a, C k , e k )], we have the following result. 

Lemma 8. Suppose we have a sequence of codes {C k } k x L 1 achieving a finite energy-per-bit e\, on the 
asynchronous diamond network in Figure 1. Consider any a > and any non-negative sequence {e^}, 
with e k — > 0. Then, for any r] > 0, we can have a sequence of codes {C' k } achieving a causal energy- 
per-bit (1 + r])eb uniformly over the messages that have non-overlapping transmission blocks, and for 
which one of the following is true: 

(a) limsup fc _ > . 00 Pr(i>B fc G T2{a,C k ,e k )) = 1, 

(b) liminf fe ^ 00 Pr(i> Bfc G T2{a,C k ,e k )) = 0, 

where T2(a,C k ,e k ) is defined in (38). 

Lemma 8 will be the basis of the proof of Theorem 4. Intuitively, if a sequence of codes satisfies (a) 
then relay 2 should be able to approximately decode the arrival time v Bk - Otherwise, if (b) is satisfied, 
then we can find yet another sequence of codes which does not use relay 2 and achieves the same 
energy-per-bit. We can now prove Theorem 4, which we restate here. 

Theorem 4. Suppose we have a sequence of codes {Cfe}^ =1 achieving a finite energy-per-bit ej, on the 
asynchronous diamond network in Figure 1. Then we can achieve arbitrarily close to the energy-per-bit 
eft with a sequence of codes {C' k }for which one of the following is true: 

(a) Relay i, for i = 1,2, can create a list C [1 : A k ] based on their received signals, such that 



VB k € a[T^ with vanishing error probability and list size |A!,' t; | subexponential in B k 



(rOi 
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(b) Relay 1 can create a list C [1 : A k ] based on its received signals, such that v Bk £ AjjT 1 ^ 
with vanishing error probability and list size \A k Tl ^\ sub exponential in B k and relay 2 is inactive 
(i.e., does not transmit any signal) 

Proof. Fix some a > and some non-negative sequence {e k } with e k — > 0. We start by using Lemma 8 
in order to assume that our original sequence of codes {C k } k L 1 has non-overlapping transmission blocks 
of length l k , achieves a causal energy-per-bit e& uniformly over the messages, and satisfies either 

lim Pr (y Bh G T 2 {a, C k , e k )) = 1, or (39) 

k— »oo 

lim Pr (v Bk G T 2 (a, C k , e k )) = 0. (40) 

k — ^oo 

Notice that the lim sup and lim inf in the statement of Lemma 8 can be replaced by limits by simply 
restricting {C' k } to the corresoponding subsequences. Also notice that, if the set of transmission times 
for the code C k is given by S k , our delay for {Cfc}^ is at most 2 y^y +£ k , which must be subexponential 
in B k . 

We consider case (39) first. We follow very similar steps to those used when we created the list 
decoder for the relays in Theorem 6. Relay 2 will use its transmit signals to implement a list decoder A k 
for v Bk with probability of error going to 0, whose list size is subexponential in B k . Since each effective 
arrival v Bk corresponds to exactly actual arrival times v Bk , we see that the list decoder for v Bk can 

then be converted to a list decoder for v Bk with a list y^y times longer. Since y^y is subexponential in 
B k , so is the size of the list for the resulting list decoder for the actual arrival time u Bk . 

The list decoder A k for v Bk selects the first N k = E ^ £c ^ lM ^k +ik 1 l] transmission blocks where 
the energy consumed by relay 2 is at least aB k , and lists the corresponding transmission times. Let 
£^ 2) [1 : v Bk +4-1] be the total energy consumed by relay 2 up to time v Bk + l k — 1. Notice that, if 

4 r k ] [1 : VB k + 4 - 1] < B k E [£ Ck [1 : v Bk + 4 - 1]] and (41) 



4 r k 2) [W(i) Bk )]>B k a, (42) 



£ { p\w(v Bk )] „ _i 

then -7—f > — r- — — — tt = N, . Moreover, there can be at most N k transmission 

4; 2) [l:* Sfc +4-l] E[£ Ck [l:u Bk +£ k -l]} k 

P £ ( J 2 \W{U)} 
blocks W(U) in [1 : v Bk +4 — 1] satisfying , r) k > N k , which implies that, if (41) and 

£ C k [i^Bfe+^fc- 1 ] 

(42) are satisfied, the list decoder A k will be correct, i.e., v Bk G A k . The probability of error of the list 
detector is thus given by 

Pr(P Bfc i A fc ) < Pr (£:£ 2) [1 : v Bk + i k - 1] > B k E [£ Ck [1 : v Bk + 4 - 1]] U £ { £ ] [W(D Bk )\ < B k a) 

< (4 r k 2) [W(u Bk )} < B k a) + Pr (f£ 2) [l : v Bk + 4 - 1] > B k E [£ Ck [l : v Bk + 4 - 1]]) 

< Pr (4l 2) WipB k )\ < B k a) + Pr (£ Ck [1 : f/ Bfc + 4 - 1] > B k E [£ Ck [1 : z> Bfc + 4 - 1]]) 

< Pr(4 fc 2) [^(^ fc )] < B k a\v Bk er 2 (a,Ck,e k ))+Pr(i) Bk $ T 2 (a, C k , e k )) + l/B k 

(a) 

< l/B k + e k + Pr(i) Bk ^T 2 (a,C k ,e k )), (43) 
where (i) follows from Markov's inequality and (ii) follows from the fact that 

Pr [£ { c k 2) [W(D Bk )) < aB k | v Bk G T 2 (a,C fc ,e fc )) 

= Yl Pr ^ = t \ i> B k £T 2 (a,C k ,e k ))Pr(4 r k 2) [W(i> Bk )} <aB k \i) Bk = t) < e k . 

teT2(a,C k ,e k ) 
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Since we are in case (39), we have Pr (pB k $ T2(a, C k , e^)) — > 0, and therefore Pv(DB k ^ A^) — > 0. 
Moreover, since g\ > g2, we know that, by adding some extra Gaussian noise to its received signal, 
relay 1 can simulate the received signals at relay 2, and compute what the output of relay 2 would have 
been at each time t. Thus, relay 1 can create a list decoder for VB k based on the simulated output of 
relay 2, and since it will be statistically equal to the actual list decoder from relay 2, its error probability 
will also tend to as k — > oo. 

Now, we need to take care of the fact that our codes only achieve causal energy-per-bit e b . To fix this, 
we will use the fact that both relays are approximately decoding the effective arrival time VB k (they have 
a list of subexponential size in B k containing UB k with high probability), to improve the coding scheme 
such that both relays can decode i>B k exactly. In order to do that, we will have the source transmitting a 
pulse after the transmission block. Define U k 
causal energy-per-bit implies that liminffc. 

2B k .e b 



x 1 achieves a 



2B k e b /a. Notice that the fact that {C k } k 
3 5^ < ^ • This, in turn, implies that for a subsequence 



of codes {C k A, N k < 



= U k ■ Therefore, we will restrict ourselves to this subsequence and drop 



the notation kj for simplicity. We will have the source transmitting a pulse of magnitude 
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at time VB k +ik- Notice that this time was previously not used by the scheme due to the non-overlapping 
transmission blocks requirement that > U + £ k + 1. The relays, after adding a transmission time ti 
to the list, use a threshold detector at time ti + t k with threshold a/4 In U k . If a pulse is found at U + £ k , 
the relay declares VB k = U, and stops transmitting after that point. This way, we will be converting our 
scheme that achieves a causal energy-per-bit to a scheme that actually achieves an energy-per-bit e^. 
However, a further modification needs to be made, before we can bound the energy used by this code. 
We let L k be the event that both relays correctly detect the pulse, thus decoding VB k correctly. We also 

(r) 

let A) " be the list decoder from relay i. Then we have 



Vr(L k )=Vr(D Bk iht ) Uv Bk i4 2) : 

>2) 



+ Pr(error in pulse detection n v Bk G A^ ri) n v Bk G A^) 



< 2Pr(P Bfc g A y p>) + Pr (error in pulse detection] v Bk G A^n^ G A^), 
and, from (43), we know that the first term tends to as k — > oo. For the second term, we have 
Pr(error in pulse detection |z>e fc G Aj, ri) n i> Bk G a[T 2) ) 



< Pr 



3t G A ( k n \i = \ ovi = 2,t^v Bk :Yi(t + e k ) > y/4\nU k 



+ Pr 



< 2|A fc | Pr 



Yi(v Bk + 4) < V^lnE/fc, i G {1, 2} 

Z<-y/4\nU k 



Z> WAlnU k 



+ 2Pr 



< 2U k e~ 2lnUk + 2e~ 2lnUk = 2{U^ + U k 2 ), 



(44) 



and we conclude that Pr(L^) — > as k — > oo. Then we define ^ k = Pr(L^), and we will have both 
relays stay silent in the last A /7|<Sfc| transmission blocks, where S k is the set of transmission times. It is 
easy to see that the probability of error of the resulting code still goes to as k — > oo. Since the relays 
stop transmitting after detecting a pulse at time ti + t k for ti G A*., we can now bound the energy used 
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by the resulting code C' k as 

(0 



E 

(«) 



< £?[fb fc [l : +4 - 1]] + £ fci^B fc +4 : *(i-v=y*)|S fc |-i + 4 - i] 

f C'[^B fc +4 : t(i_^)|s fc |_i + 4-1] 



£[%[l:^+4-l]] + ^^+£ 



< ) J e;[f Cfc [i:^ fc +4 



52 

ii 161nC/ fc „ 
1]] + + £ 



Pr(L fc ) 



<^[f Cfc [l:^+4-l]] + 



52 

16 In £4 
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■lk 



E[f c Jl:P Bfc +4-l]] 



/lk)\S k \ 



(l + ^f k )E[£c k {l:v Bk +ik-l]] + 



Pr(z> Bfe > Hi-^r k )\S k \) 
16 In C/ fc 
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(45) 



where (i) follows because, up to time % fe + 4 — 1, the energy used by code C' k is the same as the energy 
used by C k unless a pulse is incorrectly detected, in which case it is less; (ii) follows because the energy 
spent from time v Bk + 4 on is the energy used in the pulse and then either if the pulse is detected, 
or the energy that would be spent otherwise; (Hi) follows from the fact that C k has non-overlapping 
transmission blocks, and, if the pulse is missed, C' k behaves as C k would have behaved if the message 
had not arrived yet. Now, since the sequence of codes {Ck}% x L 1 achieves a causal energy-per-bit e b , it is 
easy to see that 



E £, 



lim inf 

k— ¥oo 



B k 



< e b , 



which means that C' k achieves an energy-per-bit e b with both relays decoding the effective arrival time 
i>B k exactly. Therefore, this decoder for v Bk can be converted into a list decoder for VB k with a list of 
size which is subexponential in B^. 

The previous arguments imply that if, for some a > and some non-negative sequence {e&} with 
efc — > 0, we have (39), then the sequence of codes {C k } k ^ =l can be converted into another sequence of 
codes achieving the same energy-per-bit where both relays can have a list decoder for v Bk (where 
| Afc| is subexponential in B k ) with vanishing error probability. In this case, we fall into case (a). 

Therefore, for case (b) we only need to consider sequences of codes {Ck}, such that for all a > 
and all non-negative sequences {e^} with — > 0, the sequence of codes built according to Lemma 8 
satisfies (40). Thus, we assume that, for any a > and any {ek} with ek — > 0, we have a sequence of 
codes {C k } k L 1 achieving the same energy-per-bit e& for which (40) holds. 

Our main modification will be to restrict the messages that our code Ck can send. First we consider 
the set 

C 2 (a,C k ,e k ) = \t<£ T 2 (a,C k ,e k ) : Pr (error(C fc ) \v Bk = t) < ^Pr (error(C fc ))| . 

To simplify the notation, we will refer to the sets % (a, Ck, e k ) and £ 2 (a, Ck, ek) by simply T 2 and C 2 . 
We notice that 

Pr (error(C fe )) > Pr (error (C k ) \u Bk t T 2 ) Pr (v Bk $ T 2 ) 

> Pr (error(C fc ) \v Bk t T 2 , v Bk t C 2 ) Pr (v Bk t C 2 \v Bk t T 2 ) Pr (v Bk t T 2 ) 

> ^/Pr (error(C fc )) Pr (v Bk t C 2 \v Bk t %) Pr (v Bk $ T 2 ) , 

which implies that 

Pr (v Bk i C 2 ) = Pr (v Bk i C 2 \D Bk $ T 2 ) Vr(v B 
< ^/Pr (error(C fc )) + Pi(9 Bk 



■k 

gT 2 



i T 2 ) + Pr (v Bk i C 2 \v Bk G T 2 ) Pv(v Bk G T 2 ) 
). (46) 
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From (40), (46) and the fact that Pr (error(Cfc)) — > as k — > oo, we conclude that 



lim Pr(i>B k G C 2 {a,C k ,e k )) = 1. 
fc— >oo 



(47) 



We will now use the set jC 2 (ct,C k , e k ) to define the messages that can be sent by our code. Since the 

1/8 

sequence {e k } with e k — > can be chosen arbitrarily, we fix it to be e k = max.(l/B k , £ k ), where we 
define = Pr (error(Cfc)). If the set of messages for code C k is M, = {1, 2, 2 Bfe }, then we will let, 
for each £ G £2(0:, Cfc, e^), 



A^t = {me M 



Pr 



< a 



v Bk = t, m is sent > 



a- 



(48) 



In order to lower bound the size of Mt, we notice that, for t G £2(0, Cfc, C <Sfc \ T2(a,C k , e k ), 
e k <Pr(4 r k 2) lW(D Bk )]<aB k \i> Bk =t) 

= ^ 2~ Bk Pr (4l 2) [^(^)] < aB k I *B fc = *, m is sent) 

m=l 

2- Bk Pr (4l 2) [^(^J] < I »B k =t,mis sent) 
^ 2~ Bk Pr (4l 2) [^(^J] < »B k | = i, m is sent) 



+ 



m<£Mt 

< \M t \2- Bk + (2 Bfc - \M t \) 2~ Bk j < \M t \2~ Bk + 



Therefore, we have 



(49) 



(50) 



Next we notice that for any t G £2(0, Cfc, e^), we have 

^ /2 >Pr(error(C fe )|z> Bfc =i) 

> Pr (error(C fc ) \s^[W(u Bk )} < aB k , v Bk = t) Pr [£^\W{pB k )\ < aB k 

> Pr (error(C fc ) \s£ ] [W {i> Bk )} < aB k , v Bk = i) e k , 
which implies that 

.1/2 

Pr (error(C fc ) £^[W(D Bk )] < aB k , v Bk = t) < < ^ 3/8 . (51) 

We can now write 

£ 3/8 > Pr (error(C fc ) \s^\w{v Bk )\ < aB k ,9 Bk = t) 

> Y Pi (error(C fe ) ^ ] [W(i) Bk )] < aB k , v Bk = t,m is sent) 

4l 2) [^J]<«^,^=i). 



x Pr is sent 



(52) 
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Then we notice that, for any m G Ait, for t G £2(0, Cfc, we nave 
Pr (m is sent £™ \W \v Bh )} < aB k , v Bk = t) 



PT^[W(i) Bk )]< a B k 


m is sent , v Bk = 


■ tj Pr (m is sent pB fc = t) 


Vr^\w(D Bk )}<aB k 





.1/4 
•k 



Now, if we define 

M' t = jm G Ait : Pr (error(C fe ) |,sg 2) [W(i/ B J] < aSfc, = t,m is sent) < 8^ 

we can use (52) and (53) to obtain 

^3/8 > | 2 _ Bfc Pr (error(C fe ) ^[W^bJ] < aB k , v Bk =t,m is sent) 

> |2~ Bfc Pr (error(C fc ) ^[W^bJ] < ^B fc = t,m is sent) 

> ^2- Bfc |7W t \M|8^ > 2- Bfc |7W t \7W'j4e; /4 , 
2 ffc 

and, thus, 

Moreover, for any m e Al' ( , we have 

Pr (error(C fc ) ^l^^bJ] < »B k =t,m is sent) < 8^- < 8^ /8 , (54) 

which goes to as fc — >■ 00. 

For any effective arrival time i> Bk = t « G £2(0:, Cfc, we can fix a mapping faj from {1, |^} 

onto a subset of Ai' t with |^ messages. We will choose the |^ messages m with the smallest values 
of 

Pr (error(Cfc) \v Bk =t,m is sent) , 
and we will call this subset At". Notice that this choice implies that, for m G Ai'(, 

Pr (error(Cfc) |z>B fc = i, m is sent) < ^ |.M(| _1 Pr (error(Cfe) |z>e fe = t, m is sent) 

— l-^tl _1 P r ( er ror(Cfe) \v Bh = t,mis sent) 

= 2 Bk \M' t \- 1 PT (error (C fe ) = t) 

< 4e fc ~ 1 Pr (error(C fc ) \v Bk =t). (55) 

For an effective arrival time v Bk = ti ^ £2(0, Cfc, e^), we fix any injective mapping <f>i from {1, f§^} 
onto a subset of Ai with |^ messages. We will build code C' k with B' k = B k — \og4B k , as the 
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restriction of code C k according to <j)f.. We can now upper bound the error probability of this new 
scheme as 



\s k \ 



Pr (error(4)) = £ l^" 1 Pr (error(4) \D Bk = u) 

i=l 

< Pr(9 Bk ^C 2 )+Y, \Sk\- 1 Pr (error(^) \v Bk = U) 



uec 2 



^B k 

Pr(P Bfc i £2) + ^2 ^2 l^l" 1 Pr ( error ( c fc) \v Bk = ti,mis sent 



Ue£ 2 meM'J 



< Pr{i> Bk i £ 2 ) +4€fc 1 l^r^r (error (C fe ) \v B „ = U 



< Px(9 Bk i C 2 ) + 4e* 1 l^r 1 Pr ( error ( C fc) I*b* = U) 

= Pr(i> Bk $ C 2 ) + ie^ Pr (error(C fc )) 
<Pr(P Bfc ^£ 2 )+44 7/8 , 

where (i) follows from (55). Thus, Pr (error(C£,)) — > as k — > 00. Moreover, the restriction C' k achieves 
a causal energy-per-bit 



E 



lim inf 

k— ¥00 



£ c ,[l:D Bk +e k -l] 



E 



B k - log AB k 



lim inf 

k— >oo 



S cl [l:v Bk +Z k -l] /B k (i) 

< e b , 



where (i) follows since the sequence of codes {C k } k *L l was assumed to achieve causal energy-per-bit e& 
uniformly over the messages. We will now consider using code C' k in the network in Figure 8. 




<D 



Figure 8: Contracted two-relay diamond network. 

In this network, the source possesses two antennas, A\ and A 2 . This network should be thought of as 
the previous diamond network when we allow the source and relay 2 to cooperate. Therefore, it is clear 
that any code used for the diamond network in Figure 1 can be used in the network in Figure 8 by simply 
having the source simulate the received signals at relay 2 and using its relaying functions to compute the 
transmit signals for A 2 . Clearly, the error probability and the expected energy when applying C k to this 
new network are identical to those for the original network. 

Working in the network in Figure 8, the first modification we can make to the sequence of codes {C' k } 
is to have them achieve energy-per-bit e b , rather than causal energy-per-bit e&. This can be done since, 
according to Theorem 6, relay 1 can have a list decoder for v Bk with a list size that is subexponential 
in B k . Let A k be this list, and U k a subexponential function of B k satisfying U k — > 00 as k — > 00 
and \A k \/U k — > as k — > 00. Similar to our previous argument, we can have the source send a pulse 

of magnitude 2W 41 ° c/fc at time v Bk + i k , and have relay 1 use a threshold detector with threshold 
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In U k at time ti + l k , as long as U was added to the list. If the pulse is detected, relay 1 can halt its 
transmissions. Moreover, since relay 2 and source are now together, relay 2 can stop its transmissions 
at time i>B h + (-k without having to detect any pulse. We call the resulting code {C' k f }. Following (44), 
it can be shown that the probability of error in decoding i>B k at relay 1 goes to 0, and by following the 
argument in (45), it can be shown that our modified sequence of codes {C^} achieves an energy-per-bit 
eft, and allows relay 1 to decode VB k exactly with vanishing error probability. 

The next modification we will make is to allow A2 to stay silent up to time i>B k - Notice that this 
does not happen when we use code C' k ' in the network from Figure 8, since the source is simulating what 
relay 2 would be sending during each transmission block on the actual diamond network. However, it is 
intuitive that the signals transmitted up to the time of the effective arrival time VB k are "useless" to the 
destination since they are independent of the actual message, and all they may be doing is preventing 
the destination from making a false alarm at the end of a transmission block prior to i>B k - In order 
to fix that, we notice that, during transmission blocks prior to vs k , all the source is doing is drawing 
noise sequences for the received signals at relay 2, computing the corresponding outputs for relay 2 and 
transmitting them on A2. Therefore, we will draw \S k \ i-i.d. J\f(0, 1) noise sequences of length l k prior 
to the communication session and share them among source and destination. This way, instead of source 
transmitting what relay 2 would have transmitted on a transmission block before i>B k on antenna A2, the 
destination can compute what relay 2 would have transmitted during the transmission blocks and add 
that (multiplied by \ffi2) to its received signal. Notice that, during the transmission blocks prior to i>B k , 
the statistics of this modified received signal at the destination will be the same as in the code C' k \ and A2 
will not be transmitting anything. However, we need to be careful once we get to the actual transmission 
block W{i>B k ). Since the destination does not know i>B k , it will once again add to its received signals 
the contribution of what relay 2 would have transmitted if it had received just noise. To compensate for 
this, recall that the noise sequences used by the destination to compute the contribution of relay 2 were 
shared with the source. Therefore, the source can compute the output of the relay as well, and transmit 
its opposite on A2, thus cancelling the addition done by the destination. Notice that the result of this 
operation is that antenna A2 will stay silent in all the transmission blocks, except for W(i>B k )- However, 
A2 will utilize (possibly) more energy during W{vB k ) than it would have in code C k , since it will have 
to add to its transmit signals a compensation sequence, i.e., a sequence of signals that will cancel the 
effect of the addition performed by the destination. Later we will show that this extra energy is in fact 
negligible. 

Now we need to describe the signal that the source will send on A2 during W(pB k )- If the effective 
arrival time is i>B k $ £2(0, C k , £fc)> then A2 will remain silent. If the effective arrival time is VB k = ti, 
for i > (1 — B k ~ 1 )\S k \, A2 will also remain silent. This will likely cause an error, but notice that, from 
(47), the probabilities of these two events goes to 0. Moreover, this modification can only decrease the 
energy used. If the effective arrival time is i>B k = t i € jC,2(ce,C k ,e k ) and i < (1 — B k ~ 1 )\S k \, then 
the source will repeatedly simulate what the output of relay 2 would have been during W(i>B k ) on the 
diamond network from Figure 1 using code C k , until it finds one output signal sequence X lk , satisfying 
Yliti X? < ocB k . Notice that, from (48) and the fact that the non-overlapping transmission blocks 
imply that the distribution of the transmit signals of relay 2 in W(i>B k ) is independent of the previously 
received signals, it should be possible to find such output signal. Moreover, notice that simulating the 
source repeatedly until finding such an output signal is equivalent to drawing an output signal from the 
distribution of X[W(i>B fc )] of relay 2 conditioned on the fact that i>B k = U and £^ [W(uB k )] < otB k . 
The source will thus find such an output sequence, and transmit it on A2 together with the compensation 
sequence we mentioned previously. Therefore, for any effective arrival time VB k = U € £2 (a, C k , £fc) 
with i < (1 — B k ~ 1 )\S k \, since the message we will be sending is from M."., we have that the error 
probability will satisfy (54). Now, if we let 

£' 2 {a,C k ,e k ) = {U e £ 2 (a,C k ,e k ) :»<(!- .Er 1 )!^!} , 
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it is clear that we have 

Pr [y Bk $ C' 2 (a,C k ,e k )) -)• 0, as k -)• oo. 
Thus, the error probability of this new code, which we will call C' k , satisfies 



\Su\ 



Pr (error(O) = £ IS^ 1 Pr (error(C) \u Bk = U) 



i=l 



+ S £ l^r 1 ^ Pr ( error ( C fc) 4l 2) [^(^Jl < = ti,m is sent) 



*i6£^ meMi'. 



IS* | 

= ?r{D Bk iC' 2 )+Y,\Sk\- 1 Si\ 



-l a cV8 



Pr(% fc ^£' 2 )+8^ /8 , 



where (z) follows from (54). We conclude that Pr (error (C^")) — > 0, as k — > oo. We will let £^' Vl ^ 

k 

be the energy spent by code C' k " at antenna A\ and relay 1. Notice that antenna A± and relay 1 perform 
exactly as the source and relay 1 perform when code C' k ' is used. Therefore we have 



E 



E 



'c(M,ri) 



< E 



Srii 



(56) 



Now we need to compute the expected energy consumed by antenna A 2 . Let V G R ik be the compen- 
sation sequence that is added to the transmit signal of A2 during W(i> Bk ), and let X G M. £k be the actual 
transmit signal that the source draws, satisfying ||X|| 2 < aB, to be transmitted on A2. Then if siffl is 

k 

the energy used by A2, we have 



E 



S. 



(M) 



E[s^[W(D Bk )] 



E 



< E 



4 



1=1 



i=i 



i=i 



= 2£[||X|| 2 ] +2£[||y|| 2 ] 
<2aB k + 2E[\\V\\ 2 ] . 



(57) 



In order to upper bound the value of E [H^H 2 ], we recall that, if v Bk = ti, V is a shared random 
sequence that was drawn as the output of relay 2 during W(ti), assuming that only noise was received 
at relay 2, i.e., assuming v Bk > ti. Moreover, if v Bk = ti with i > (1 — .B^ 1 )!^, since A 2 stays silent, 



33 



we have II V 



|2 _ 



0. Thus we obtain 



E[\\Vf] 



(l-B- x )\S k \~l 

£ 

i=i 



i-i 



E[\\V 



I 2 I VB k 



\S k \ l E [4» } [1 = \i- B ?)\s k \-\ +4-1] VB k > ^i-B-i)!^, , 



< l-Sfcl" 1 ^ 
= ISVI 



Sr. 



VB k > t {1 _ B -l )lSkl 



E 


£r" 




("B k > t (1 _ 





D 



-l 



£r" 



(58) 



Notice that | S k | is exponential in B k (since = t^t is subexponential in B k ) and thus \S k \ 1 B k — >■ 
as — >■ oo. Therefore, the energy-per-bit used on antenna A2 satisfies 



lim inf 



(0 

< lim inf 

k^foo B k — log 4±>fc k^oo 



2aB k + 2\S k \- 1 B k E 



B fc -log4B fc 



2a 



(«) 



lim inf 

fc^oo 1 _ lo § 4B fc 

2a, 



+ 2|5fc| 



1 _log4B k Bk 
B k 



(59) 



where (i) follows from (57) and (58), and (it) follows from the fact that \S k \ B k — > as k — > 00, and 



lim inffc^oo E 



£r" 



/B k < eft. Clearly, we can find a subsequence of codes {C kj }j° =1 , for which the 



lim inf in (59) holds as limit, and for which E 



£ 



< 3a(B k . - log4S fe7 ) < 3aB kj . Thus, by only 



keeping the codes in {C' k .}, we conclude that the sequence of codes {C' k "} can be used on the network 
in Figure 8 with the addtional constraint that any sequence of codes {C k } k x L 1 satisfies 



E 



£ 



(M) 



< 3aB k . 



(60) 



Intuitively, if a is very small (recall that we could have fixed a > arbitrarily small), antenna A2 should 
not be very useful for the scheme. This idea is captured in the following Lemma, whose proof we present 
in the Appendix. 

Lemma 9. Consider the network shown in Figure 8 in the asynchronous setting. Suppose a sequence of 
codes {Cfe}^ =1 satisfies (60) and achieves a finite energy-per-bit. Then we must have 



lim inf ' r . 

k— >oo B k 



{- + 1 



cA > 7 (l + / 3)(- + f- )-/(«), 



where f(a) is a function satisfying f(a) — > as a — > 0. 

Then we notice that the sequence of codes {C' k }, once restricted to the subsequence {C' k .}, achieves 
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an energy-per-bit 



E 



C" 



E 



lim inf 

fe^oo Bu — logABh 



lim inf ■ 

k— >oo 



s. 



(M,ri) 



c" 



E 



S. 



(M) 



E 



< lim inf 



S. 



B k - log AB k 

(Ai,n)' 

nil! 



E 



AM) 



fc^Voo" B k - log AB k + ^ S ^ P - \ogAB k 



(i) £7 
< lim inf 



'-'nil 
'-fe 



fc^oo — log 4i?fc 
(ii) 

< e& + 3a, 



+ 2a 



where (i) follows from (59) (since the lim inf can be replaced with the limit) and the fact that A\ and 
ri perform identically in code C' k " and C k \ and (ii) follows from the fact that code C' k ' achieves an 
energy-per-bit e\> on the network in Figure 8. Therefore, by applying Lemma 9, we conclude that 



f 1 



Since this inequality should hold for any a > 0, we may let a — > to obtain 

e b >i(i + p) (- + ^- 

\9i hi 

Finally, we consider the energy-per-bit that can be achieved by using only relay 1. Once we remove relay 
2, the network can be viewed as two concatenated point-to-point channels, i.e., the relay first functions 
as a destination for the first hop, and then as a source for the second hop. The minimum energy-per-bit 
for these channels, according to Theorem 1, is respectively 

7(1 ±0) and 7(1 + 0) 



9i 



hi 



Moreover, since the communication delay in each hop can be made subexponential in B k , the total delay 
(the sum of the two) will still be subexponential in B k . Therefore, we conclude that the scheme using 
only relay f achieves the same or smaller energy-per-bit than the scheme using both relays, and we are 
in case (b). This concludes the proof of Theorem 4. □ 



8 Concluding Remarks 

In this work we started studying the fundamental limits of energy-efficient communication in relay 
networks with bursty traffic. For the diamond relay network, we showed that the minimum energy- 
per-bit can be achieved with codes where each relay is either synchronized or not used. Intuitively, 
this result should not be surprising. The idea is that a relay that is not synchronized will most likely 
waste energy outside of the actual communication block and harm the achieved energy-per-bit. This 
result was then used to derive a lower bound for the asynchronous minimum energy-per-bit for the 
diamond network which allows us to prove that separation-based schemes are nearly optimal in high 
asynchronism regimes. 

But the intuition that a relay that is not synchronized cannot be helpful from an energy-per-bit point 
of view extends beyond a simple diamond network. Thus, it seems reasonable to expect that a result 
similar to Corollary 1 holds for general wireless networks. Such a result would have interesting conse- 
quences. It would essentially imply that a separation-based scheme that synchronizes a certain optimal 
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subset of the relays can achieve close to the asynchronous minimum energy -per-bit. Moreover, it would 
raise the questions of how to find the optimal subset of relays to be synchronized and what the correct 
strategy for synchronization is. In a large non-layered network, it is not even clear in what order the 
relays should be synchronized. 

However, it should be noted that the techniques used to prove Theorem 4 cannot be easily extended 
to larger networks. In particular, we notice that Lemma 9 essentially implies that if we have a code 
where relay 2 uses very little energy, then it is possible to come up with a new code where relay 2 is not 
used at all. In order to prove that, we used the fact that the capacity (and, thus, the minimum energy- 
per-bit) of two concatenated point-to-point AWGN channels is known. However, even if we just wanted 
to extend this result to an iV-relay diamond network, we would no longer be able to prove such a result, 
since the capacity of the (N — l)-relay diamond network is not known. Therefore, new techniques must 
be developed in order to prove that a relay that uses a negligible amount of energy -per-bit can in fact be 
turned off, without affecting the performance of the coding scheme. 

Appendices 

I Proof of Theorem 1 



Theorem 1. For an asynchronous AWGN channel, the minimum energy -per-bit is given by 



async = (J + ^ sync 



where H = liminffc^oo H(v Bk )/B k . 



Proof. Converse: Consider an arbitrary sequence {Ck}^ = i of asynchronous codes, achieving a finite 
energy-per-bit e\ } , and let the error probabilities be Pr (error(Cfc)) = e k , where e k — > 0. 

We consider using code C k in a synchronous AWGN channel. We let X Ah be a (discrete) random 
vector of length A k = A k + d Bk — 1 that has the distribution induced on the input of the channel 
by first choosing a message M uniformly at random from {1, 2 Bk }, then choosing a time index T 
from {1, A k } according to the distribution of v Bk , and then having the source and the destination 

operate according to the asynchronous code C k . Let Y Ak be the corresponding received signal at the 

destination. Notice that, from Y k Ak , it is possible to decode M and output a set {t, t + 1, t + d Bk — 1} 
of consecutive time steps such that T belongs to it with probability at least 1 — e k . Moreover, notice 
that there is a one-to-one correspondence between values of (T, M) and values of X Ak , and therefore 

H(X Ak ) = H(T) + B k = H(u Bk ) + B k . If C{P) is the capacity of the synchronous AWGN channel 
with average power constraint P, we have 

C (E[£ Ck ]/A k ) = sup Y») > Y*) 

{Hi) 1 A A 1 r A A A 

> I{X ^;Y k Ak \Tmodd Bk ) = T - H(X Ak \T mod d Bk )-H(X Ak \Y k Ak ,T mod d Bk ) 
A k A k l 



A k 
1 

A k 

(iv) 1 

> 

Ai 



Ak \Y Ak , T mod d Bk ) 



H{X* k )-H{T mod d Bk )-H{X k 
H(v Bk ) + B k - \ogd Bk - H(X Ak \Y Ak ,T mod d Bk ) 

{H(u Bk ) + B k -logd Bk - [H(e k ) + e k (l + P)B k ]}, 



(61) 
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where f(x n ) refers to any distribution on X n ; (i) follows since T is a function of X^ k and thus Y k Ak <H> 
X^ k <H> T mod d,B k ', and (ii) follows from Fano's inequality, since from Y k Ak and T mod ds k we can 
decode X k k with probability of error smaller than e k . Inequality (61) implies that 

inf JL_ < E \ £ c k ]/A k < ^cj/i* 



p>o C(P) " c (E[f c J/i fe ) " iMfe (^J + 5 fe - \ogd Bk - H(e k ) - e fc (l + /8)B fc ] 

~i^J/i? fe+ l-^-^- efe (l + /?)' 

Finally, using Lemma 1 , we conclude that 

and, thus, (1 + i5>f nc < ef ync . □ 



lim inf — ^ Cfc > e^ ync lim inf 



II Proof of Theorem 2 

Theorem 2. 77ie asynchronous minimum energy-per-bit for the network in Figure 1 satisfies 

\92 hi + h 2 , 



Proof. We construct a sequence of codes {Cfc}^ =1 , where Cfc transmits B k bits assuming arrival distri- 
bution i>B k , for any sequence {B k ,} k x L 1 , where — >■ oo as k — > oo. We use a separation-based scheme 
scheme which achieves asynchronous energy-per-bit 

(l + 5) 2 7 (l + /3) (l + _L- > ) 
V52 h 1 +h 2 J 

for an arbitrarily small 5 > 0. Similar to the scheme described in the achievability of Theorem 1, the 
source sends a pulse as soon as the message arrives. This pulse is detected by the relays, which send 
another pulse to the destination, taking advantage of beamforming. If relays and destination detect their 
pulses correctly, the network becomes a synchronous network, and we may employ decode-and-forward 
to communicate the B k bits. 

More precisely, upon receiving the message, at time vs k , the source will transmit a pulse of magni- 
tude 

(l + &WWB k )/g, 

for 5 > 0. This is analogous to the pulse used in the proof of Theorem 1 when PB k (t) = 2~l 3Bk for 
t G [1 : A k \. Notice that we use g 2 in the denominator, since g 2 < gi, and we want to synchronize both 
relays. Relay i declares that the pulse is detected at time t if Yi[t] is the first received signal larger than 
(1 + 5/2)^/^f5Bk. By following the same steps in the achievability proof of Theorem 1, it is not difficult 
to see that, for any 5 > 0, the probability of the relays not detecting the pulse tends to as k — > oo. 
Notice that we use g 2 in the denominator, since g 2 < g\, and we want to synchronize both relays. 

If the relays correctly detect the pulse, they can transmit pulses to the destination in the next time 
slot. Since they can use beamforming to reduce the total energy required, at time us k + 1, each relay 
will send a pulse of magnitude 

(l + 5W(lPB k )/(4h). 
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Again by following the steps in the proof of Theorem 1 , it can be shown that if the destination declares 
that a pulse has been detected if the received signal value exceeds (1 + 5 /2)^J^j3Bk, the probability that 
the destination does not decode the pulse correctly also tends to as k — > oo. The total energy-per-bit 
consumed in the synchronization phase is 

(l + <5) 2 7 /3 [1/0 + 1/(2/1)]. 

To compute the energy used in the communication phase, we notice that we can analyze the two 
hops separately, since we are employing decode-and-forward. For the first hop, the energy-per-bit can 
be chosen arbitrarily close to the minimum energy-per-bit of a point-to-point channel with channel gain 
Q2. Thus we choose the energy-per-bit used by the source to be (1 + 5) 2 ^/g2- Since g\ > g 2 , both 
relays are guaranteed to decode the message with high probability. For the second hop, we again use 
beamforming to reduce the energy-per-bit that is consumed. Thus, relay 1 and relay 2 will use the same 
codebook, but with different scaling coefficients. More precisely, relay i will use a codebook where the 
energy-per-bit of each codeword is at most 

This can be done by using Gaussian random codebooks and replacing the codewords that exceed the 
energy-per-bit in (62) with zero codewords. Since this constraint is satisfied by every codeword, even 
in the event that the pulse or the message from the source is not decoded by both relays, the energy- 
per-bit consumed in the communication phase will be (1 + 5) 2 'j(l/g2 + 1/(^1 + fo)), and the total 
energy-per-bit is 

(l + 5) 2 7 (l + /3) (L+l 

\92 hi + h 2/ 

□ 



III Proof of Lemma 2 

Lemma 2. Consider the networks in Figures 2(a) and 2(b), where the message arrival time vb is 
uniformly distributed in [1 : 2^ B ]. Then, the minimum asynchronous energy-per-bit e™ m of these two 
networks is given respectively by 

e - in = (! + /?) and ef n = (1+P)-. 



91+92 ' ' h 1 + h 2 



Proof. Achievability. For the network in Figure 2(a), we consider having the destination do a pre- 
processing on the received signals. From Y\ = yJg~\X + Z\ and Y 2 = yJg~2~X + Z 2 , the destination will 
build an effective received signal 



Y = 



91 



91 +92 



Yi + t— ^ = (Vg7^g~2)x + z, 

V 91+92 



where Z = y 3l +g 2 ^i + y gi+g 2 ^ 2 ~ -^(0> !)• Therefore, we now effectively have a single-antenna 
point-to-point AWGN channel with channel gain y/gi + g 2 , and Theorem 1 guarantees that the minimum 
energy-per-bit is at most (1 + f3) gi + g2 - The same idea can be used to convert the channel in 2(b) into 
a single-antenna point-to-point AWGN channel with channel gain \fh\ + h~2, by using a pre-processing 
at the source. 
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Converse. Notice that the same argument used in the converse of Theorem 1 can be used in order to 
guarantee that if a sequence of codes {Cfc}^ 1 achieves asynchronous energy-per-bit e\>, then we have 

inf^> C » 



p>oC(P) -1 + 0' 

where C(P) is the capacity with average power constraint P of either the network in Figure 2(a) or the 
network in Figure 2(b). It is not difficult to see that we have C(P) = \ log (1 + (51 + g 2 )P) in the 
former case and C{P) = \ log (1 + (hi + h 2 )P) in the latter case, thus establishing the result. □ 



IV Proof of Lemma 3 

Lemma 3. Consider the M1MO channel in Figure 3 in the asynchronous setting, where the message 
arrival time vb is uniformly distributed in [1 : 2^ B \. Consider a sequence of codes {C k } k x L 1 that 

(s) 

achieves a finite energy-per-bit, and let 8 C * be the energy spent by code C k at the source transmitter Si, 
for i = 1,2. Then, we must have 



E 



a lim inf ■ 

fe^oo 



c( S l) 

-c k 



E 



Bk 



+ b lim inf ■ 

k— >oo 



8 



(S2) 

C k 



Bk 



> 7(1 + 0)- 



Proof. Consider any sequence of codes {C k }'^ =1 for this channel achieving a finite energy-per-bit with 
delay ds k - The expected energy used by code Ck can be written as 

E[8 Ck ] = E\8 Ck \v Bk < A k /2] Pv(u Bk < A k /2) + E[8 Ck \v Bk > A k /2] Pr(v Bk > A k /2) 
= \E[£c k \v Bk < Ak/2] + l -E[8 Ck \u Bk > A k /2], 
and we also have 



aE 



8 



(si) 



+ bE 



8 



aE 



(si) 



+ 



8 C 
aE 



VB k < Ak/2 



c-(si) 

-c k 



+ bE 



8 



(«2) 



u Bk > A k /2 



+ bE 



VB k < A k /2 

■(«) 



~~c k 



v B k > 



A k /2]) . 



Thus we must have either aE 



or aE 



8 



(si) 



c k 



VB k > A k /2 



8 



(si) 



VB k < A k /2 



+bE 



+bE 



8 



(S2) 



c k 



4 S2) 
< aE 



v Bk < A k /2 



8 



(si) 



+bE 



8 



< aE 

(S2)~ 



8 



(si) 



+bE 



8 



(S2)] 



c k 



c k 



. Suppose the former 



VB k > A k /2 

case without much loss of generality. Then we will modify code Ck to obtain a code C' k that only uses s± 
in the following way. An arrival time u Bk e {2k — I, 2k} will correspond to an arrival time VB k = k'm 
the original scheme, for = l,...,A k /2. If a message arrives at time VB k € {2k— 1, 2k} the sequence of 
ds k transmit signals on antenna s\ will be sent at times 2/c+l, 2(/c+l)+l, 2(fc+2)+l, 2(/c+ds fe )+l. 
The sequence of ds k transmit signals that should be sent over antenna s 2 according to code Ck will be 

sent on antenna s i multiplied by a factor at times 2&+2,2(£;+l)+2,2(A;+2)+2,...,2(£;+dB fc )+2. 
Now the destination can simply interpret the signals received at times 2k + 1 and 2k + 2, for k = 
l,...,A k /2 as the signals received on its two antennas. With this interpretation of the received sig- 
nals, the destination can apply the same decoder from code C k . The delay of the new code is at most 
d' Bk = 2ds k + 2, and its error probability satisfies 



Pr (error(4)) = Pr (errar(Cfc)|i/ Bfc < A k /2) < 2 Pr (error(C fc )) , 
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and also tends to as k — > oo. The energy used by code C' k satisfies 



E 



--C 



E 



S^\u Bk <A k /2 



+ b -E 
a 



4f\v Bk <A k /2 



Since we now have a sequence of codes C' k that achieves a finite energy-per-bit on a point-to-point 
channel with channel gain a, we must have, from Theorem 1, 



E 



lim inf 



£ { c f\vB k <A k /2 



E 



Bh 



+ 



£ { c fW Bk <A k /2\\ 7(1 + /3) 



B k 



> 



aE 



lim inf 



£^ ] \v Bk <A k /2 



bE 



B k 



+ 



4f\u Bk <A k /2 



Bh 



> 7(1 + 0) 



E 



a lim inf 

k— ¥oo 



£ 



(si) 



E 



Bh 



+ b lim inf 

k— >oo 



£ 



(«) 



Bh 



> 7 (l + /3) 



□ 



V Proof of Lemma 4 

Lemma 4. Suppose we have a sequence of codes {C k } k x ^ =1 , where code C k operates on a channel with 
uniform arrival distribution on [1 : 2@ Bk ] but only transmits B k — f(B k ) bits, with /(■) > and 

lim = 0. (63) 

k^roo B k 

Suppose that, in addition, this sequence of codes satisfies the following. : 

• Hindoo Pr (error(C fc )) = 

l °S d B k _ n 

• unife^oo Bk -f{B k ) ~ u 

• liminffc^oo Bk _ f( k Bk) < e b 

Then, for any rj > 0, this sequence can be used to construct a new sequence of codes {C' k } k ^ 1 , where 
code C k operates on a channel with uniform arrivals on [1 : 2@ B k] and transmits B' k bits, satisfying 

• Hindoo Pr (error(C): )) = 

\ogd' B 

• linife^oo — = 

k 

• liminf^oo g ^f fcl < (1 + rf)e b , 

k 

i.e., {C' k } achieves an energy-per-bit (1 + rj)eb according to the original definition. 

Proof. Notice that we may assume wlog that f(B k ) — > oo as k — > oo. Otherwise, we can simply 
modify each code C k to transmit only B k — f(B k ) — \fB~k bits (and the remaining \fB~k bits can be 
chosen uniformly at random by the source, so that they are not message bits). Clearly, if we define 
f(B k ) = f{B k )+y/Bk, f(B k ) still satisfies (63), and we have f(B k ) > ^fW k for all k, and f(B k ) -»• oo 
as k — > oo. 
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We will let B' k = B k - f(B k ), for k = 1, 2, .... It is clear from (63) that B' k ->• oo as k ->• oo. 
Then, we will use code which transmits — f(B k ) bits for an arrival distribution i/£ fc , to create 
the code C' k which transmits B' k bits for an arrival distribution v B i . Notice that the arrival time window 
assumed by code C k , W k = [1 : 2 /3Bk ], must be modified into a shorter arrival time window w' k = 
[1 : 2 /3B fc], and since B k - B' k — > oo as k — > oo, \w' k \/\W k \ = 2~^ Bk - B 'k) -> as k -> oo. We 
consider partitioning W k into M = \\W k \/\w' k \\ subintervals Ij, j = I,..., M. The j th subinterval is 
Ij = [(j - l)\w' k \ + 1 : j\w' k \], fori = 1, M - 1, and I M = [(M - 1)K| + 1 : |W fc |]. 

Now we fix any 7? > 0. We will show that, for k sufficiently large, we can find an interval Ij , with 
j < M, such that 

E [£ Ck | VB k G /,] < (1 + r?)£ [fc fc ] and Pr (error(C fc )| z. Bfe G < 2(1 ^^ Pr (error(C fc )) . 
To see this we first define the set J = {j < M : E [£ Ck | v Bk G /,-] < (1 + rf)E [£c k ]}- Then we have 

M—l , , , , 



%]>ES £ I ^ ^ — E Sr^ I ^ e /,-] 

j=l k ' jgJj^M k ' 

J \ l„../ 



* E + ^[^J = |{i,...,m-i}\j| J^L(i + !/)£; [fifcj 

|{1, M - 1} \ J| < — B < (64) 

Similarly, we define the set J' = |j < M : Pr ( error (C fc )| i/ Bfc G ij) < Pr (error(C fc ))}. Then 

we obtain 

M-l , , | , / , 

Pr (error(C fc )) > £ i-^ Pr (error(C fc )| z^ fc G /,-) > E f^M Pr ( error ^l " B * G 7 ^ 

j=i 1 fc| HJ'j+M 1 fc| 

* E |S^^(error(C fc )) = |{l,..,M-l}\j'|M 2 ^ 

jf£J',j^M 1 fe| ' 

=> |{1, M - 1} \ J'\ < , W l . < Mri - (65) 
11 i V 1 " 2(l + r/)K| " 2(l + r?) 

From (64) and (65), we conclude that 

| J D J' | > M — 1 — |{1,...,M- 1} \J| - |{1,...,M- 1} \ J'| 

M Mr? f V \ 

> M - 1 - - — ^— = M — - - 1. 

(1 + r?) 2(1 + 7/) \2{\ + rj)) 

Therefore, since M — > oo as k — > oo, for k sufficiently large, J n J' ^ 0, implying that we can find our 
desired subinterval Ij . 

Since \Ij\ = \w' k \, we will build code CjJ. from code Cfc by having Cfc operate as if it were in the 
interval Ij. In order to do that, we consider drawing two sequences of (j — l)|u4| i-i-d. Af(0, 1) noise 
values and sharing them among all relays and the destination. The relays will start operating at time 1 
in the interval w' k as if they were in time (j — l)\w' k \ + 1 in code C k and had received, prior to that time, 
their corresponding shared noise sequence. The destination will then use the relaying functions that the 
relays would have used in code C k and apply them to the shared noise sequence of each relay, thus being 
able to simulate what the relays would have transmitted in code C k prior to time (j — l)|w4| + 1. This 
way the destination can simulate the signals it would have received prior to time (j — l)\w' k \ + 1, and 
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start operating at time 1 as if it were at time (j — + 1 in code Therefore, we see that, for this 

new code C' k , 

Pr (error(^)) = Pr (error(C fc )| v Bk G !■) < Pr (error(C fe )) , 

which tends to as k — > oo, for any fixed r] > 0. Moreover, our new code C' k will consume (in 
expectation) the same energy that code Ck would consume during Ij, conditioned on v Bh £ Ij- But this 
is clearly less than the total energy consumed by Ck conditioned on v Bk £ Ij, and we have 



E 



<E[£ Ck \u Bk eI :j }<(l + V )E[£c k }. 



By letting d' B = d Bk , we conclude that our new sequence of codes {C^,}^ =1 achieves an energy-per-bit 
(1 + rj)eb and communicates B' k bits on a channel with arrival distribution v B > . □ 



VI Proof of Lemma 5 

Lemma 5. Suppose we have a sequence of codes {C^}?^ achieving a finite energy-per-bit e\> on the 
asynchronous diamond network in Figure 1. Then we can build another sequence of codes {C' k } with 
delay constraint d' Bk sub exponential in B' k = B^, with non-overlapping transmission blocks of length 
Ik, for which 



E 

lim inf 



^[1:^+4-1] 



< (l + *7)e 6 , 



fc— >oo Bfc 

for any arbitrarily small rj > 0, and whose probability of error goes to as k — > oo. 

Proof. Our first step is to "delay" the entire coding scheme by 2ds k - In order to do this, we shift all 
the encoding functions at the source, the relaying functions at the relays and the decoding functions 
at the destination by 2dB k time steps. More specifically, suppose the message m arrives at time UB k , 
and let EnC((^B fe ,m), for t = i/B k ,VB k + ^-,---,^B k + ds k — 1, be the signals transmitted by the 
source during [vs k '■ ^B k + ds k — 1]. After the delaying operation, if the message arrives at time 
VB k , the source will wait until time vs k + 2d_e fc and transmit Enc t (^ fc , m) at time t + 2ds k for t = 
VB k , VB k + 1, v B k +dB k — 1- Similarly, if according to code Ck relay i transmitted, at time t, Xi[t] = 
ft (Yi [1] , Yi [2] , . . . , Yi [t — 1] ) (where Y{ [t] is the received signal at relay i at time t), then after the delaying 
operation relay 1 will transmit, at time t + 2ds k , Xi[t + 2ds k ] = ft(Yi[2dB k + 1], Yi[2ds k +t — 1]). 
Finally, if the destination, at time t, made its detection/decoding using a function ^(1^(1], ...,Yd[t — 
1]) according to code Ck, then it will, at time t + 2ds h , make its detection/decoding by computing 
gt(Yd[2dB k + 1], Yd[2ds k + 2], ld[2ds fc + t — 1]). Clearly if we increase the delay constraint from 
ds k to 3ds fe , this new code has the exact same error probability as Ck- We will refer to this delayed 
version of code Ck as C' k . 

Next, consider partitioning the arrival interval [1 : Ak] into consecutive blocks of length ds k , and let 
Mi, M2 C [1 : A k ] correspond to arrival times that belong to odd blocks and even blocks respectively. 
If we assume for simplicity that Ak = 2qds h for some q G Z, then the expected energy used by the 
delayed code C' k can be written as 



E[£ c , k ] = E\S c ,\y Bh G Mi] Pr(v Bk G Mi) + E\Z c ,\v Bk G M 2 ] Pr(i/ Bfc G M 2 ) 
= \E[£c> k \v Bk G Mi] + l -E[£ c , h \v Bk G M 2 ]. 
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Therefore, we must have E[£ C i\vB k £ Mi] < E[£ C >J or E[£(>>\vB k G M2] < E[£q/]. Let us suppose, 
without loss of generality, that the former is true. Then we will pick our set of transmission times 
S C Mi + 2d,B k - More specifically, since we have a total of q length ds k blocks in Mb we select our 
transmission times to be U, i = 1, q, where U G [2id,B k + l '■ (2i + l)d,B k ]- Notice that this guarantees 
that any two transmission times will be separated by at least ds k time steps and at most 3ds k time steps. 
We will set Ij = [2(i — l)d,B h + 1 : 2ids k ], for i = 1, q, and = ds k - 

For a given choice of transmission times S = {ti,...,t q }, the source will perform as follows. If the 
message arrives in the time window /j = [2(i — l)dB k + 1 : 2£dB fc ], then the source will wait until time 
ti (which occurs after the end of to transmit it. This mapping operation performed by the source is 
depicted in Figure 9. Clearly, for any choice of S, the decoding delay will be at most 4ds fc , which is 




Figure 9: Mapping performed by the source from us k to VB k - 
subexponential in B^. 

For each choice of S, we will let C£ be the resulting code with delay constraint d Bk = 4<i# fc , and 
we consider choosing S uniformly at random. This is equivalent to picking each t% independently and 
uniformly at random from [2ids k + 1 : (2i + l)ds k ], for i = 1, q. For a given S, the expected energy 
of the new coding scheme is given by 

A k 

E[£ e s] = ^2E[£ c s\D Bk = t] Pr(D Bk = t) 
t=i 
(i) 1 



Y J E[£c k \vB k =t-2d Bk ) 
q tes 
(«) 1 



Y J E[£ Ck WB k =t-2d Bk ], (66) 

where (i) follows since, conditioned on us k = t, the new code C k performs exactly as C' k conditioned 
on VB k = t — 2dB k (due to the delay of 2dB k ), and (ii) follows because, for any UB k , Cp. and C' k spend 
the same amount of energy in expectation (although at different times). When we consider averaging 
over the ensemble of choices of S, we obtain 

W&\ = J- £ E ^cs\ = -i~ £ £ = i - 2d Bfc ] 



*B fc 5 qd B k s ,^. s 



B k 



E = *] = -r S = t] 



H B h teMi K teMi 

^E[£ Ck \uB k eM 1 }<E[£ Ck }, (67) 
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where (i) follows from the fact that 

E[E Ck \v Bk eM 1 ]=J2 Pr (^ = M"B k e M 1 )E[£ Ck \vB k = t] 
teMi 

= y J^t^) E[£ \ v =t]=y 2_ E{E \ v = t] . 

Similar to (66), the probability of error of the new coding scheme for a fixed choice of S satisfies 

Pr (error(Cf )) = 1 E Pr (error(C fc )|i/ Bjk = t - 2d Bk ) . (68) 
q tes 

Similar to (67), when we average over all choices of S, we obtain 

Pr(error(Cf)) = -j- E Pr (error (Cf )) = E E Pr(error(C fc )|i/ Bt = t - 2d Bk ) 

a B k S qa B k S teS 

= 3~ E Pr(error(C fe )K=i) 

(0 

= Pr(error(C fc )|i/ Bfc G Mi) < 2£ fc , (69) 
where, in (i) , we use the fact that 

Pr(error(C fc )) = Pr(error(C fc )|i/ Bfc G Mi)Pr(i/ Bfc G Mi) + Pr(error(C fc )|i/ Bfc £ Mi)Pr(z^ Bfc £ Mi) 
> ^Pr(error(C fc )|z/ Bfc G Mi), 

and = Pr (error(Cfe)). Even though (67) implies the existence of a choice of S for which E[£ c s] < 

k 

E[£c k ] and (69) implies the existence of a choice of S for which Pr (error(Cf )) < 2^, where — >■ 
as A; — >■ oo, we have no guarantee that there exists an S satisfying these two conditions simultaneously. 
To fix this, we consider any small rj > 0, and we notice that, when we choose S uniformly at random 
over all possibilities, from Markov's inequality and inequalities (67) and (69), we have 



pr ( E[Sci ] > (i + , m ^i) < t^Utt £ rb <*» 



( / , 4(1 + 77)6- \ Pr (error (C?))ri 

Pr Pr (error Cf ) > 1 - l,qk < < 

V V ' ~ V J ~ 4(1 + 7/)^ " 



2(1 + 7/)' 

We now use the union bound, (70) and (71) to conclude that 

Pr (e[£ c s] < (1 + v)E[£ Ck ] and Pr (error(Cf )) < 4(1 + v)Ck ) > 1 - ' 



(71) 



7? / - 1 +7? 2(1 + 7/) 

7/ 

= 2(r+^) > • 

Therefore, for 7/ > arbitrarily small, there exists at least one set S, such that E[£ c s] < (1 + r/).E [<?c fc ] 

and Pr (error(Cf)) < 4(1 + T;)gfc . Notice that 4(1 + T;)€fc ->• for any fixed 7/ > 0. Thus, we will pick our 
set of transmission times to be one such S, which we will refer to as S^. Our new sequence of codes 
achieves an energy-per-bit at most (1 + r/)eb, for any arbitrarily small 7/ > 0. 
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Notice that, at a time i £ [tj : tj + — 1], if there is a transmission happening, it must have started 
at tj. Therefore, any detection made by the destination at a time r G [t j : tj + Ik — 1] can wait and be 
outputted at time t 1 = U + £k — 1. This will not affect the error probability, since the probability of late 
decoding will not change. The advantage is that the destination will now only make decisions at the end 
of a transmission block. Thus, if we let Q = tj + ds k — 1, for % = 1, \Sk\, we can define the error 
event 

Li = {t = Q, v Bk > U} U {r = d, v Bh =ti,m^m}U{T> Q, v Bk = U} (72) 

for % = 1, \Sk\. These three subevents can be read as false alarm at time Q, wrong decoding at time Q 
and missed detection at time Q, respectively. It is easy to see that any error event corresponds to Lj for 
some % G {1, |Sfc|}, and Lj n L 3 ■ = if % / j. Thus, the error probability of our code can be written 
as 

\s k \ \s k \ 

Pr (error(Cf)) = £ Pr(Lj) ( = 3 £ Pr (L 1:i _ l5 L it v Bk > t t ) 
i=l i=l 

\S k \ 

= ^Pr (Li \Li :i -i,i>B k > U) Pr u Bk > k) 

i=i 

|S fc | i-1 _ 

= ^Pr (Li \ L v .i-\-,v Bk > ti)Pr(D Bk > U) Pr (Lj \ L 1:j _ 1 ,v Bk > U) , (73) 

i=l j=l 



where (i) follows since Li implies i>B k > U and = L\ D ... D Then we notice that for each 

z G {1, I-Sfcl}, conditioned on the fact that z>^ fe > ij, the relays only received noise during time steps 
1, U — 1. Therefore, there is no actual information received up to time U — 1. Thus, intuitively, the 
same performance should be achieved if, instead of using the actual noise received in [1 : tj — 1], the 
relays used a random noise sequence of size U — 1 that is drawn before the communication session, and 
shared among relays and destination. Notice that, in this case, the destination would not need to use its 
received signals during times [1 : ij — 1], since it can simulate the output of the relays, and simulate the 
AWGN channel between the relays and itself. 

More formally, for each tj G we will draw two noise sequences of length U — 1 (one for each 
relay) and share it among relays and destination. Then, during the transmission block [ij : tj + d,B k — 1], 
the relays compute their outputs assuming that the received signals during times [1 : tj — 1] were the 
corresponding noise sequence. During the same transmission block, the destination simulates what its 
received signals would have been during times [1 : tj — 1] if the relays had in fact received the shared 
noise sequence. This way, our resulting coding scheme will satisfy the third property of a coding scheme 
with non-overlapping transmission blocks. 

However, we still need to specify, for each tj, the distribution from which we draw the noise sequence 
for each relay. Somewhat surprisingly, the natural choice of drawing the noise sequences i.i.d. Af(0, 1) 
does not work. Instead, we will use the intuition provided by (73) to define how we draw the noise 
sequences. For a given tj, let Nf._i, ^Vf— l G IR <i_1 be the random vectors associated to the received 
signals at relays 1 and 2, conditioned on the fact that i>B k > U (this guarantees that the relays actually 
received just noise in [1 : tj — 1]). To simplify the expressions, we define N^-i = (iV t 1 ._ 1 , iV t ^_ 1 ). For 
each tj, we will draw the pair of noise sequences according to the distribution of N^-i conditioned on 
VB k > U and Li : j_i. For the resulting code, which we call Cf 7 , we define the error events L\ exactly as 
in (72). Next we claim that, for our new scheme, for any % G {1, |<Sfc|} and t > tj, we have 



Pr L' ; 



L' 1:t _ v v Bk > t) = Pr (Li | L 1: j_!, v Bk > t) . (74) 
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To see this, notice that, conditioned on v Bk > t and Li : j_i in the case of Cjj!, and conditioned on 
v Bk > t and L' Vi l in the case of C^' , the distribution of the relays' received signals (or perceived 
received signals for Cf 1 ) during [1 : tj — 1] is the same, and, therefore, the distribution of their output 
signals during [ti : U + l k — 1] will be the same. Therefore, from equations (73) and (74), we see that 
the probability of error of our new code is identical to the probability of error of our previous code, 
i.e., Pr (error(Cf )) = Pr (error(Cf )). 

Unfortunately, the same cannot be said about the average energy spent by the new code. In particular, 
we notice that after the transmission block where the message was sent, the relays will keep operating 
based on the noise sequences that were drawn. Therefore, in some sense, the relays are assuming that 
the message has not been sent yet, which may cause them to use more energy than in the previous coding 
scheme. This is why in the statement of the Theorem we only require that the energy spent by the code 
up to time v Bk + ik — 1 = v Bk + d Bk — 1 is at most (1 + rj)eb. In order to be able to bound the 
energy used by our code up to time v Bk + d Bk — 1, we consider making a slight modification to it. If 
we let 7^ = Pr (error (Cf'))> tnen we Wli l nave ^ ne source an d the relays stay silent in the last T/jk\Sk\ 



transmission blocks. We let Cf" be the resulting code. If v Bk > tn_ 



(W=r*)|Sk|- 



this modification will 



0, as 



most likely cause an error, but since 7^ — > as k — > 00, this only occurs with probability 
k — > 00. Thus it is clear that Pr (error(Cf ")) — > as k — > 00. 

Now let us consider the energy spent by our new code up to time VB k + ds k — 1. First we notice 
that, for i < (1 - J^)\S k \, 



E 



Zcl" I 1 : "B k + d Bk ~ 1] v Bk = = ^E \s c sn[tj : tj + d Bk - 1] 

j'=i 

i 

Ut) H E [ £ cs[tj-tj + dB k -l] 
E[s c s[tj:tj+d Bk -l] 



i=i 

i 



v Bk — ti, L\.j-\ 
v Bk = U 



Pr(L 



l:j-l\t>B k - U 



(75) 



where (i) follows since energy is only spent during the transmission blocks, and (ii) follows because, 
from the way we drew our shared noise sequences, the expected energy spent in [tj : tj + d Bk — 1] 
by code Cf 7 ' for j < (1 — ^Jjk)\Sk\ is the expected energy spent by code Cf in [tj : tj + d Bk — 1], 
conditioned on For i > (1 — ^ l /Jk)\Sk\, we have 



E 



£ c s,f[\ : v Bk + d Bk - l]\v Bk = U 



- E * 

(i) (i-v^)!^!- 1 e 

E - 

j'=i 

(i-v^)!^!- 1 e 
(«) ^ 



E c s„ [tj : tj + d Bk - 1] i> Bk = U 



< 



£ c s [tj : tj + d Bk - 1] 



i> Bk = U 



Pr(L 1:j - 1 \i> Bk = U 
£ c s [tj : tj + d Bk - 1] 



v Bh = U 



Pr(L 1 .j^ 1 \u Bk =*(i-^)|s fc |) 



, (76) 



where (i) follows in the same way as the steps in (75), and (ii) follows because, conditioned on u Bk = 
ti with i > (1 — ^/7fc)|<Sfc|, or on v Bk = *(i- v ^:)|5 fe |> code Cf. performs in the same way on any 
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transmission block [tj : tj + ds k — 1] for j < (1 — y/jk)\Sk\. We also have that, for i > j, 







Pr(Li y -_i|i/B fc = ti) = Pr(Liy-_i|i/B fc > t») 

= l-Pi(u£ 1 L k \v Bk >U) 



Ik 



, n,, \ Pr (errorfCf )) 

(77) 

where (i) follows since the performance of the code up to transmission block i — 1 is the same whether 
VB k = tior i>B k > ti. Finally, we obtain 



E 



\s k \ 



£ c s„[l : v Bk + d Bk - 1] I = \ S k\ ±E £ cl" I 1 : D B k + d Bk - 1] 



i=l 



VB k = U 



* E \Sfs\tj : tj + d B , — 1] 

i=i j=i 



\s k \ 

<■ E 

i=(l--/Yk)|Sfcl 
(l-v^)|5 fc |-l 



< E E 

i=i j=i 

|S fc | i 



Pr(Li :i _i|P Bfc = t 
^E[£ c s[t r .t 3 +d Bk -l] 


i) 

^Bfe = *i 


1 Pr(Ll:j-l|i>B fc =*(1-A 
^[fcsfeltj+dBfc-l] 


/7fe)|Sfc|) 
^Bfc = *i 


1 7fc 

Pr(*>B fc >*i) 

^[f cf [^:t J+ dB fc -l] 


^Bfc = ti 



Ik 



■ _ _ * # [f c s : ^ + d Bk - 1] |z>B fe = U 

- ^ lSkl h 



(Hi) |5fcl 



1=1 



\s k \ 



V '« j =1 



^l^r 1 ^ f c s[l:£>B fc +dB fc -l] 



= T -^£;[^[l:^+dB fc -l]]< r= 



(78) 



where (i) follows from (75) and (76), (ii) follows from (77), and (in) follows from the fact that for 
i < (1 — \Z7fc)|<Sfc|> P r (^B fc > > \/7fc- Thus, we conclude that 



E 



lim inf ■ 

k— >oo 



f c s«[l : v Bk +d Bk - 1] 



5i 



< (l + r?)e 6 , 



since lim inf fc^oo E 



£ c s /Bk < (1 + T?)e&. This concludes the proof. 



□ 
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VII Proof of Lemma 6 



Lemma 6. Suppose we have a sequence of codes {Ck}^ = i with non-overlapping transmission blocks 
achieving a causal energy-per-bit e& on the asynchronous diamond network in Figure 1. Then we can 
have a sequence of codes {C' k } that have non- overlapping transmission blocks, achieving a causal 
energy-per-bit (1 + r/)eb uniformly over the messages, for any rj > 0. 

Proof. Consider a sequence of codes {C^'jfLi with non-overlapping transmission blocks achieving a 
causal energy-per-bit e&, and fix some 7] > 0. For a code Ck and each transmission time ti G Sk we can 
define the set of messages 

{m e {1, 2 Bk } : E [£ Ck [W (u Bk )}\ v Bk = U,mis sent ] < (1 + V )E [£c k [W(y Bk )]\ v Bh = U}} 



M 



To lower bound the size of we notice that 



2 a k 



E [£ Ck [W(9 Bk )]\ v Bk =t l ]=Y J ^ Bk E [£ Ck [W(u Bk )}\ v Bk = t u m is sent ] 



m=l 



> 2- B "E[£ Ck [W(i> Bk )]\u Bk =t i ,miss e nt[ 

m<£Mt l 

> Y, 2-^(1 + r,)E[£ Ck [W(i> Bk ) 



which implies that 



\{l,...,2 B x}\M u \< 



2 B k 

1+7] 



\M U \ > 



7]2 Bk 
1 + 7]' 



Therefore, for each transmission block W(U) = [U : U + Ik — 1], we will pick the messages m 
for which 

E [£ Ck [W(D Bk )}\ v Bk =ti,m is sent ] 

have the smallest values. Let ip = 1 + 1/rj. For each transmission block W(U), we fix any bijective 
mapping Xk,i f rom {1 5 2 Bk /ip} to the 2 Bk ftp messages chosen. Then, for m G {1, 2 Bk /ij)}, we 
must have 

E[£ Ck [W(i> Bk )}\i> Bk =t ijXfc)i (m)issent] < (1 + r,)E [£ Ck [W(v Bk )}\ v Bk = U] . (79) 

From code Ck we can build code C' k , where B' k = Bk — log ip, as the restriction of Ck according to Xk, 
i.e. C' k = C k . Recall that, in order to show that {C k } achieves a causal energy-per-bit (l+r])eb uniformly 
over the messages, we need to show that {C' k } achieves a causal energy-per-bit (1 + 7])eb, and that, for 
any sequence of restrictions {4>k}, we have (28). Thus, we first consider a restriction of C' k according to 
some arbitrary 4> k , where <p k ,i ■ {l,...,M k } -+ {1, 2 Bk /ip}, for some M k . LetW ti C {1, 2 Bk /ip} 
be the image of <pk,i- Then, for any transmission time ti, we have 



E 



£MW{u 



VB k = U 



E[£ c , k [W(t i )]\v Bk =t i ,meW ti 
Yl M k lE [£c k [W(U)]\ »B k = ti,Xk,i(m) is sent; 



meWt, 



<(l + 7]) Y M- 1 E[£ Ck [W{t i )]\v Bk =t i 
= (l + r l )E[£c k [W(t i )}\u Bk =t i }, 



(80) 
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where (i) follows from (79). Therefore, we have 



E 



\S k \ 



£ c ,*[l : v Bh + 4 - 1]] = Yl \ S k\^E [f c *[l : v Bk +4-1] 



i=l 



\S k \ 



3=1 



(0 



=1 



i-1 



< Ei^i -1 ( E^fewn^ = *i] + (i+^[^j^)]i^ fc = u 

3=1 



i=l 



< (1 + fj) £ l^l" 1 ^ [1 : ^ fe + 4 - 1]| *B fc = U] 



i=i 



(1 +^[^[1:^+4-1]] 



(81) 



where (i) follows from the fact that, prior to z>e fe , C' k performs exactly as C k and from (80). Now, from 
(81) we clearly have that 



E 



lim inf 



£ c ,4, [1 : v Bh +4-1] 



B k - log if) 



< (1 + V ) lim inf g [ ^ [ ^ : % + 4 ~ 1]] = (1 + V )e b , (82) 



k— >oo 



B k -logif} 



which means that, for any sequence of restrictions {cf> k }, (28) is satisfied. Now, in order to see that {C' k } 
achieves a causal energy-per-bit (1 + rfjeb, we first notice that if we set M k = 2 Bk /if) and each (f) kyi to 
be the identity map, (82) implies that 



E 



lim inf 

k— ¥oo 



£ c ,[l:u Bk +£ k -l] 



B k - log if) 

Moreover, the error probability of code C' k satisfies 



< {l + rj)e b . 



\S k \ 2 s fc/V 

Pr (error(4)) = E E l^rV2" B * Pr (error {C k )\D Bk = t uX k,i( m ) is sent ) 

i=\ m=l 
\S k \ 2 B k 

<EE |5fc|"V2" B * Pr (error(C fc )|z> Bfc = U, m is sent ) 

i=l m=l 

= tf)Pr (error(C fe )) , 

which tends to as k — > oo, meaning that {C' k } achieves a causal energy-per-bit (1 + rj)eb. Thus, we 
conclude that {C' k } achieves a causal energy-per-bit (1 + rj)e b uniformly over the messages. □ 
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VIII Proof of Lemma 7 



Lemma 7. There exists an a > and a non-negative sequence {e k } with e k — > 0, such that 

lim sup Pr (u Bk £ T{a,C k ,e k )) = 1. 

fc^oo 



Proo/ We assume, by contradiction, that for all a > and all non-negative sequences {e k } with — > 0, 
limsup^oo Pr(z>e fe S T{a,C k ,e k )) < 1- Notice that for any a > and any non-negative sequence 
{e fc } with e fe ->■ 0, if t £ S k \ T{a, C k , e k ), then 



( S^\w{i>B k )] £l 2) [W(D Bk )\ 



VB k =t \ > e k . 



(83) 



\ B k B k 

Consider some a and some non- negative sequence {e^} with e k — > 0. By assumption, we must have 

limsupPr(P Bfc £ T{a,C k ,e k )) < 1 - 5 

k— >oo 

for some 5 £ (0, 1). Therefore, for some ko large enough, Pv(D Bk ^ T(a,C k , e k )) > 5/2 if k > ko, 
which implies that the set S k \ T(a, C k ,e k ) is always non-empty for k > ko. In addition, we have that, 
for code C k , 

Pr(error(C fc )) > Pr (error (C k )\v Bk f T(a, C k , e k )) Pr (u Bk £ T(a,C k ,e k )) 
> Pr(error(C fe )|£ Bfc £ T(a, C k , e k )) 5/2, 

where Pr(error(C / t)) — > 0, as k — > oo. This implies that there exists at least one t £ S k \ Tia, C k , e k ), 
such that 



m ^ 2Pr(error(C A .)) A _ 
Pr (error (C fc ) | j/ Bfc = *) < , = &> 



(84) 



for k > ko- Notice that £ k — > as k — > oo as well. To generate our contradiction, we will choose 

cV 



e& = max(^y 4 , l/B k ), which satisfies — >■ and > 0. 



By noticing that the message m is independent of v Bk , from (83), we can write 

4 r M(*B k )} 



Pr — - — < a and 



Bv 



< a 



v Bk = t 



2 a k 



4: } mi>B k )] 



J2 ^ Bk Pr 



m=l 



B k 



< a f or i = 1 , 2 



VB k = t,mis sent > e k . 



(85) 



Next we define the set of messages 
Mt = { m : Pr 



4 r : ] [w(9 Bk )} 

B k 



< a for i = 1, 2 



z>b, = t, m is sent > 



(86) 
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and, from (85) we have 



meMt 



Bk 



< a for i = 1, 2 



+ J2 2~ B *Pr Ck „ V ^ < a fori = 1,2 



Si. 

m<£Mt \ 

< 2 -B * + ^ 2~ Bk e k /2 

meMt mfMt 

= 2- B *\M t \ + (1 - |M|2- Bfc )e fc /2, 
from which we conclude that 



z>B fe = i, m is sentj 
z>B fe = t, m is sent 



2 



<2- B MM|(i-|) =► |M|>|r 



e fc 2^ 



Now we can write 



^ > Pr(error(C fc )|P Bfc = t) 
S^[W(u Bi 



> Pr 



< a, j = 1,2 



x Pr ^error(Cfc) 
> e fc Pr error (C k ) 



"B* = *,— 5 ^ = 1 ' 2 

^B fc =*,— 5 = 1 ' 2 

-da- 



and we conclude that we have 



Pr error (C k ) 



4: ] [W(D Bk )\ . I & /4 

*B fc =t, ^ <a,j = l,2 <-<^ . 



We can now write 



£ /4 > Pr I error(C fc ) 



2 s 



Z>B fc = t, 



4[ j) [W(i>B k ) 



Bk 



< a, j = l,2 



y~] Pr I ;// is sent 

m=l 

x Pr I error(Cfc) 



VB h = t, 



Bk 



<a,j = 1,2 



= i, — — 5 < a, j = 1,2, mis sent 
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Next we notice that, if m €. Mt, we have 



Pr mis sent 



- . 4 r : ] iw(vB k )} 

v B k =*,— — 5 <a,J = 1,2 | = 



Pr 


f 4?V(i> Bfc )] ^ 


a, j = 1,2 


i>B fc = t, m is sent^j Pr(m is sent p£ fe = i) 


Pr 


f 4?V(f Bfc )] ^ . 1 „ 

B fc <a,j = l,2 


»B k = t 





> Pr I — 5 — < a, j = 1,2 



Si 



z>B fe = i, m is sent Pr(m is sent \v Bk = t) 



> | Pr(m is sent |z> Bfc = t) = ^2""* > 



tV4 



and, thus, from (88), we obtain 



> E V 2 ~ Bfc Pr ( —(^) 



a/4 



me .Mi 



. , 4 r : ] iw(vB k )] . . 

v Bk = t, — - — < a,j = 1,2, mis sent 

£>fc 



2^ /4 2 Bfc > E Pr(error(C fc ) 



^ = *> — 5 < a, j = 1, 2, m is sent . (89) 

Dh 



Now if we let 



M' t = {meM(:Pr (error(C fe ) \v Bk = t, 

— < a, J = 1, 2, m is sent < 



St 



we obtain 



2 ^2/4 2 B fc > 4 ( 2 ~ 



2/4 

^ 2(2 -ejfc) 



> \M t \M' t \ 



\M' t \> 



e h 2 Bk 2 Bk 
> 



2(2 -e k ) ~ 4B k > 

where the last implication follows from (87). Moreover, notice that, for m G M' t , we have 



Pr error(Cfc) 



VB k = t, 



-Bfc 



< a,j = 1, 2, m is sent < 



4(2 - e^f 



.1/4 



<8# , (90) 



which goes to 0, as k — > oo. 

In order to generate our contradiction, we consider using this sequence of codes in the synchronous 
channel shown in Figure 10. For each k > k$, we can find an arrival time t, as described before, and 
a subset of messages M' t containing at least 2 Bk /4B k messages, each satisfying (86) and (90), when 
used over the original asynchronous channel. In this synchronous channel, our source S will receive a 
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Figure 10: Synchronous channel considered. 



message chosen uniformly at random from Ai' t , and then play the role of both relays, since it possesses 
two separate antennas with channel gains \fh\ and VJ12 to the destination. However, we change the 
scheme so that the source only needs to transmit what the relays would have transmitted during the 
transmission block [t, t + d Bk — 1]. 

For a randomly selected message m € M! t , the source proceeds as follows. It draws two signal 

sequences of length A k = A k + d Bk — 1 using the joint distribution of the transmit signals (X^ k , X^) 
of the relays when code C k is used in the diamond network, conditioned on ^ = t and m being 

sent. If the resulting signals satisfy S^ 1 \W{vB k )] < ctB k and £c[W(pB k )] < ctB k , then the source 
transmits the signals corresponding to the transmission block [t, t + d Bk — 1] from each of them over 
their corresponding antennas. Otherwise, the source repeats the process, until such transmit signals are 
found. Notice that (86) guarantees that such a pair of transmit signals will eventually be found. 

It is important to notice that the fact that our original sequence of codes {C k } k x L l had non-overlapping 
transmission blocks guarantees that the destination applies its decoder for transmission block [t : t + 
d Bk — 1] based only on the signals received during this interval (and not on signals received during 
[1 : t — 1]). Therefore, the error probability only depends on the signals transmitted by the relays during 
[t : t + d Bk — 1]. This allows us to conclude, from (90), that the error probability of this scheme, for 
any chosen message m, is upper bounded by , and — > 0, as k — > 00. The energy-per-bit 
achieved by this sequence of codes is given by 

. E[£ Ck ] . .. 2aB k 

limmf —ijj- < lim — — — = 2a. 

fc-^oo log \M t \ k^oo Bk — log4Bfc 

However, since a > can be chosen arbitrarily small, this is a contradiction to the fact that the channel 
in Figure 10 has a positive minimum energy-per-bit. Therefore, we conclude that, for any sequence of 
asynchronous codes for the diamond network with error probability going to 0, for some a > and 
some non-negative sequence {e^} such that e& — > 0, we must have (30) satisfied, which concludes the 
proof. □ 



IX Proof of Lemma 8 



Lemma 8. Suppose we have a sequence of codes {Ct^^Li achieving a finite energy-per-bit e& on the 
asynchronous diamond network in Figure 1. Consider any a > and any non-negative sequence {e^}, 
with €k — > 0. Then, for any rj > 0, we can have a sequence of codes {Ci} achieving a causal energy- 
per-bit (1 + rj)eb uniformly over the messages that have non-overlapping transmission blocks, and for 
which one of the following is true: 

(a) limsup^^Pr {y Bk £ T2(a,C k ,e k )) = I, 

(b) liminffe^oo Pr (v Bk G T 2 (a,C k ,e k )) = 0, 
where 7i(a, C k , e k ) is defined in (38). 
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Proof. Fix any small rj > 0, any a > and any non-negative sequence {e k } with e k — > as k — > oo. 
From Lemma 5, we know that the original sequence of codes can be converted into another sequence of 
codes with non-overlapping transmission blocks, achieving a causal energy-per-bit (1 + rj)eb uniformly 
over the messages. Thus we will assume that our original sequence of codes {C k } k x L 1 already satisfies 
these properties, and has a transmission block length 4- Notice that, if the set of transmission times is 
given by S k , our delay for {C k } k *L l is at most 2 ^ + £ k , which must be subexponential in B k . Now, 
suppose we have 

limsupPr (y Bk G T 2 {a,C k ,e k )) = 7 and lim inf Pr (v Bk G T 2 {a,C k ,e k )) = 7, 

where < 7 < 7 < 1. Let 5 = \ min(7, 1 — 7). Then, for k large enough, we must have 

Pr iv Bk G T 2 (a, C k , e k )) > 5 and Pr [v Bk $ T 2 (a, C k , e k )) > S, (91) 
for all k. Now, for each k, notice that, since 

E[£ Ck [l : v Bk +£k ~ 1]] =E[£ Ck [l : VB k + 4 - l]|*B fc G T 2 (a, C k , e k )} Pr G T 2 (a,C fe ,e fe )) 

+ £[£ Cfc [l : ?B fc + 4-1] I i>B fc i r 2 (a,C k ,e k )}Pv(i) Bk £ T 2 {a,C k , e k )) , 

we must either have 

E [£ Ck [1 : v Bk + 4 - 1]| G T 2 (a, C fc) e fc )] < £ [% [1 : ^ + 4 - 1]] or 
£ [fc t [1 = VB k + 4 - 1]| £ T 2 (a, C fc , e fc )] < £ [% [1 : v Bk + 4 - 1]] ■ 

In the former case, we will define 14 to be the set of the 5\S k \ effective arrival times t from T 2 (a, C k , e k ) 
with the smallest values of 

E[£ Ck [l:i> Bk +e k -l]\i> Bk =t]. 

In the latter case, we will define to be the set of the 5\S k \ effective arrival times t € {t±, t\g k |} \ 
T 2 (oi,C k ,E k ) with the smallest values of 

E[£ Ck [l:9 Bk +e k -l]\9 Bk =t}. 

Notice that the sequence T k satisfies 

Pr(P Bfc G T fc ) = 5 > 0, and £ [£ Cfc [1 : "B k + 4 - 1]| £T fc ]<B [f c , [1 : ^ + 4 - 1]] (92) 

for all k large enough. Next, we use code C k to build code C' k in the following way. Code C' k will 
have S\S k \ transmission times. Notice that, from Definition 3, tj+i — U > t k + 1, which implies 
< A k /£ k . Therefore, we have ^j^-y > > 4» an d, if we choose our S\S k \ transmission times to 

be t\ = jj^y, i = 1, &\S k \, there will be strictly more than 4 time steps in between two consecutive 
transmission times. We will perform a mapping from the 5\S k \ transmission times in to the new 
transmission times t\ = jrejjj, i = 1, ^l-S'jfcl- We will choose this mapping to preserve the order of the 
original transmission times in T k . The source will now start the transmission of any message received 



in 



. 1 . iA k 
s\s k \ 1 • S\S k \ 



at time t\ = jj^y- Moreover, if tj G T k is mapped to t' { , then the encoding 
functions, relaying functions and decoding functions used at times [t^ :t' i + £ k — l] will be the functions 
used in the original scheme during times [tj : tj + £ k — 1]. At any time t £ U^ fe '[^ : t! i + 4 — 1]> 
source, relays and destination will be inactive. Notice that what allows us to perform this remapping of 
transmission blocks is property 3 in Definition 3, which guarantees a sort of "independence" among the 
blocks. 



54 



For code C' k , the decoding delay will be at most + £ k , which is subexponential in B k . It is not 
difficult to see that code C' k performs with an error probability not greater than the error probability of 
code C k if the effective arrival distribution had been, instead of u Bk , a new effective distribution v' B , 
such that Pr(i>' Bk = t) = jj^r-y if t G and Pr(i> Bk = t) = otherwise. Thus, for code C' k , using (92), 
we have 



Pr (error(4)) < Pr (error (C k )\ v Bk G T k ) < 



Pr (error(C fc )) = Pr (error(C fc )) 
PT(u Bk G Tfc) 6 " ' 



which goes to 0, as k — > oo, since 5 is a positive constant. Moreover, if tj G is mapped to t\, then 
we have 



4 r : } iw(*B k )] 



Pr 



Bk 



< a 



v B k 



Pr 



t' 



(93) 



This is the case since, conditioned on z/# fe = ij, the distribution of the transmit signals of relay 2 in 
W{i>B k ) using code C k is the same as the distribution, conditioned on v' Bk = t' { , of the transmit signals 
of relay 2 in W(v' B ) when using code C' k . Furthermore, it is easy to see that C' k also achieves a causal 
energy-per-bit (1 + 77) e& uniformly over the messages. For our new code C' k , the set T 2 (a,C k ,e k ) is 
defined in terms of the new effective arrival distribution vL . It is then not difficult to see that we will 

Bk 

have, for each B k , either 

Pr(i7 Bfc eT 2 (a,C k ,e k )) =0 or Pr {y' Bk G T 2 (a,C k , e k )) = 1, 
depending on how we chose T k . This clearly implies that 

liminf Pr {y' Bk G T 2 (a,C k ,e k )) = or limsupPr (i>' Bk G T 2 (a,C k ,e k )) = 1. (94) 

fc— ^oo fe^oo 

Moreover, from (92), the expected energy used by C£ up to z>g + i k — 1 satisfies 



% [1 = +4 - 1] < £?[£cjl : *B fc +4 " 1]| ^ G T fc ] < E [£ Ck [l :v Bk + 



1]] 



which concludes the proof. 



□ 



X Proof of Lemma 9 

Lemma 9. Consider the network shown in Figure 8 in the asynchronous setting. Suppose a sequence of 
codes {Cfe}^ =1 satisfies (60) and achieves a finite energy-per-bit. Then we must have 

liminf > 7 (1 + p)(± + ±-)- /(a), 

where /(a) is a function satisfying f(a) — > as a — > 0. 

Proo/ We start by considering the following network in the asynchronous setting, where we assume 
that there is a constraint of the form (60) on antenna A 2 . Consider any sequence of codes {C k } k *L l for 
this network achieving a finite energy-per-bit with delay d Bk . The expected energy used by code C k can 
be written as 

E[Sc k ] = E[£ Ck \v Bk < A/2] Vr(v Bk < A/2) + E[£ Ck \u Bk > A/2] Vr(v Bk > A/2) 
= \E[£c>Bk < A/2] + l -E[£ Ck \u Bk > A/2]. 
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Figure 1 1 : Network with parallel channels. 



Thus we must have either E[£c k \v Bk < A/2] < E[£ Ck ] or E[£ Ck \v Bk > A/2] < E[£ Ck }. Suppose 
the former case without much loss of generality. Then we will modify code Ck to obtain a code Ci 
that only uses A\ in the following way. An arrival time v Bk G {2t — l,2t} will correspond to an 
arrival time v Bk = tin the original scheme, for t = 1, ...,A/2. If a message arrives at time v Bk € 
{2t — l,tk} the sequence of d Bk transmit signals on antenna A\ will be sent at times 2t + l,2(t + 
1) + l,2(t + 2) + l,...,2(t + d Bk ) + 1- The sequence of dB k transmit signals that should be sent 

over antenna A2 according to code Ck will be sent on antenna A\ multiplied by a factor at times 

2t + 2, 2(t + 1) + 2, 2{t + 2) + 2, 2(t + d Bk ) + 2. Now the destination can simply interpret the signals 
received at times 2t + 1 and 2t + 2, for t = 1, A/2 as the signals received on antennas A± and A2 
respectively. With this interpretation of the received signals, the destination can apply the same decoder 
from code C^. The delay of the new code is at most d' Bk = 2ds k + 2, and its error probability satisfies 

Pr (error(C^)) = Pr (error(C fc )|z^ fc < A/2) < 2Pr (error(C fc )) , 
and also tends to as k — > 00. The energy used by code C' k satisfies 



E 



£r>. 



E 



££ l] \v Bk <A/2 



h 2 
+ —E 

9i 



In the case where h 2 < g\ , (95) implies that 



E 



£ 



< E 



4f\u Bk <A/2 



+ E 



4fWB k <A/2 



£% 2) \vB k <A/2 



(95) 



= E[£ Ck \u Bk <A/2]<E[£ Ck ]. 
In the case where h 2 > gi, we notice that since code Ck must satisfy (60), we must have 



(96) 



3aB k > E 



£ 



and, thus, 



6aB k > E 



~ 2 



£t ] WB k <A/2 



£t ] \»Bk<A/2 



(97) 



Now by multiplying (97) by (1 — h 2 /gi) (which is negative) and adding the resulting inequality to (95), 
we obtain 



E 



£c> 



'4f\u Bk <A/2 



+ E 



{h 2 /gi - l)6aB k < E 

= E[£ Ck \u Bk <A/2}<E[£ Ck }. 
By combining (96) and (98) we can write, for all h 2 and g\, 



4t 2) WB k <A/2 



E 



- 7 aB k < E [£ Ck 



(98) 



(99) 
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\vhere7 = 6max[0, \i2jg\- 1]. Since {C' k } is a sequence of codes for a point-to-point channel achieving 
a finite energy-per-bit, we know from Theorem 1 that it must satisfy 



and, thus, 



E 



lim inf 

k— >co 



Bi 



> 



7(1 + P) 



lim inf — „ > lim inf 



.91 



7 (1 + 
7a > jot. 

91 



(100) 



Next we consider the network in Figure 12, where we again assume that there is a constraint of the form 
(60) on antenna A%. Consider a sequence of codes {Cfc}^^ that achieves a finite energy-per-bit on this 




• D 



Figure 12: Two-input one-output network. 

network. In order for us to lower bound the energy-per-bit of this sequence of codes, we will notice 
that any sequence of codes of this network can be used on the network in Figure 13(a). The network 




Z. ~N(0,\-S) 



Z 2 ~ JV(0, 5) 




Z, ~ JV(0, 1) 




(a) 



(b) 



Figure 13: Networks with parallel channels. 

in Figure 13(a) is a network with two parallel channels just as the network in Figure 11, except that the 
additive Gaussian noise at each of the two receivers have variances 1 — 5 and 5, for some 5 G (0, 1). Any 
sequence of codes {Ck}^? = i for the network in Figure 12, can be directly used in the network in Figure 
13(a). The only modification that needs to be made is to have the destination add the signals received on 
each of its two receivers. After this addition, the network effectively becomes the network from Figure 
12. Moreover, it is easy to see that the network from Figure 13(a) is entirely equivalent to the network 
in Figure 13(b), since the SNR on each channel is the same. 

Therefore, we can use our previous reasoning to lower bound the energy-per-bit achieved by the 
sequence of codes {Cfc}^ =1 . From (100), we see that 



lim inf 

k— >oo 



E[£ c 



> 



7 (l + /3)(l-£) 
hi 



7'a, 



(101) 
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0, h2 ^ lS S ^ — 1 , and 5 G (0, 1) is a free parameter that we can optimize over. We 



where 7' = 6 max 
will choose S = min[l/2, y/a\. Then we obtain 



E[£c k ] > 



liminf 

k— >00 Bp. 



> 



> 



7(1 +0) 7(l + 0)<5 

hi hi 

7(1 + 7(1 + 

7(1 + 7(1 + /3)^ 



— 6 max 
/12a 



a 



6 



(102) 



Now we are ready to prove the Lemma. Suppose we have a sequence of codes {Cfc}^ =1 for the network 
in Figure 8 under the additional constraint (60). We first notice that these codes can be applied to the 
network in Figure 1 1 . In order to do that, we would have the destination considering its upper receiver 
to be relay 1, and then simulating what relay 1 would have transmitted and adding that to the received 
signals at the lower receiver. The energy used when applying code Ck from the network in Figure 8 on 
the network in Figure 1 1 is just the energy that Ck would consume on A\ and A2. Therefore, from (100), 
we have that 



E 



lim inf 

k— ¥00 



£, 



(M 



+ E 



£, 



{M 



> 



7(1 + 0) 
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7a. 



(103) 



Similarly, we notice that the sequence of codes {Ck} k x L l for the network in Figure 8 can be applied to 
the network in Figure 12. This time, the source from Figure 12 computes what the source from Figure 8 
would have transmitted over A\ and simulates what relay 1 would receive and transmit. Then it transmits 
the simulated outputs of relay 1 over A\ . The signals transmitted on A2 would be the same in both cases. 
The energy consumed when using code Ck on the network in Figure 12 is the energy that relay 1 and 
source antenna A2 would consume. Therefore, from (102), we obtain 



E 



lim inf ■ 



e, 



in) 



+ E 



4 A2) 



.7(1 + 0) 7(l + £)v^ W2a + v^) 

> : : 0- 



Bk hi hi hi 

In order to lower bound the total energy-per-bit of the sequence of codes {C^}^ we first compute 



(104) 



E 



lim inf 

k— ¥00 



An) 
-c k 



+ E 



AM' 



+ 2E 



AM 



E 



Bk 

AM 



+ E 



> lim inf 

fe— >oo 

(i) 7(1 + 

> 7a + 

9i 

= 7(1 + 0) f- + ^ 
\9i hi 



AM 
-c k 



E 



r . + lim inf- r> 

Dfc k— >oo Hp. 

7(1 + 0) 7(1 + 0)V^ M2a + Ja) 



An) 
-C k 



+ E 



AM 
-c k 



hi 

— 7a — 



hi 



hi hi 
7(l + /3) v /a _ g /i 2 (2a + y/a) 



hi 



(105) 
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where (i) follows from (103) and (104). We also have that 



E 



lim inf- 

k— >oo 



p(n) 



+ E 



(Ai) 



+ 2E 



C(A2) 

-c k 



E 



< lim inf 

k— ¥00 



s. 



Bk 

(n) 



+ E 



£. 



Ck 



+ E 



S. 



c k 



E 



+ lim sup 

k—^oo 



s. 



(M) 



Bk 



,. . t E[£ Ck ] E 
= lim mi — - + lim sup — 



8 {M) 
c k 



+ 3a, 



Bk 



(0 

< lim inf 

k—^oo Bk 



E[£c k ] 



(106) 



where (i) follows from the constraint (60). Finally, by combining (105) and (106) we obtain 



lim inf > 7 (i + p)(L + ±. 

k^oo Bk \gi hi 



/(«), 



where 



J (a) = 7a H h 6 h 3a, 



which clearly satisfies /(a) — > as a — > 0. 



□ 
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